Author: Supervisor: Robin M. Schmidt Dr. Massimiliano Mancini
作者:导师:Robin M. Schmidt 博士 Massimiliano Mancini
Co-Supervisor:
联合导师:
Prof. Dr. Zeynep AKATA
教授 博士 Zeynep AKATA
Reviewer:
评审:
Prof. Dr. Philipp Hennig EBERHARD KARLS UNIVERSITAT TUBINGEN
教授 博士 Philipp Hennig 埃伯哈德·卡尔斯·图宾根大学

A thesis submitted in fulfillment of the requirements
为完成以下学位要求而提交的论文
for the degree of Master of Science (M.Sc.) in Computer Science
计算机科学硕士(M.Sc.)学位
in the
于
Department of Computer Science
计算机科学系
Explainable Machine Learning Group
可解释机器学习组
Abstract of thesis entitled
题为
Submitted by Robin M. Schmidt for the degree of Master of Science (M.Sc.) at the University of Tübingen in April, 2021
由Robin M. Schmidt提交,申请图宾根大学计算机科学硕士(M.Sc.)学位,2021年4月
Traditionally, for most machine learning settings, gaining some degree of explainability that tries to give users more insights into how and why the network arrives at its predictions, restricts the underlying model and hinders performance to a certain degree. For example, decision trees are thought of as being more explainable than deep neural networks but they lack performance on visual tasks. In this work, we empirically demonstrate that applying methods and architectures from the explainability literature can, in fact, achieve state-of-the-art performance for the challenging task of domain generalization while offering a framework for more insights into the prediction and training process. For that, we develop a set of novel algorithms including DIvCAM, an approach where the network receives guidance during training via gradient based class activation maps to focus on a diverse set of discriminative features, as well as ProDROP and D-TRANSFORMERS which apply prototypical networks to the domain generalization task, either with self-challenging or attention alignment. Since these methods offer competitive performance on top of explainability, we argue that the proposed methods can be used as a tool to improve the robustness of deep neural network architectures. Copyright ©2021, by Robin M. Schmidt ALL RIGHTS RESERVED.
传统上,在大多数机器学习场景中,获得一定程度的可解释性以帮助用户更好地理解网络如何及为何做出预测,往往会限制底层模型并在一定程度上影响性能。例如,决策树被认为比深度神经网络更具可解释性,但在视觉任务上的表现较差。在本研究中,我们通过实验证明,应用可解释性文献中的方法和架构,实际上能够在具有挑战性的领域泛化任务中实现最先进的性能,同时提供一个框架以深入理解预测和训练过程。为此,我们开发了一系列新算法,包括DIvCAM,一种通过基于梯度的类激活图在训练期间引导网络关注多样化判别特征的方法,以及ProDROP和D-TRANSFORMERS,它们将原型网络应用于领域泛化任务,分别结合自我挑战或注意力对齐。鉴于这些方法在提供可解释性的基础上表现出竞争力的性能,我们认为所提方法可作为提升深度神经网络架构鲁棒性的工具。版权所有©2021,Robin M. Schmidt 保留所有权利。
I, Robin M. SCHMIDT, declare that this thesis titled "Explainability-aided Domain Generalization for Image Classification" which is submitted in fulfillment of the requirements for the degree of Master of Science in Computer Science represents my own work except where acknowledgements have been made. I further declare that this work has not been previously included as a whole or in parts in a thesis, dissertation, or report submitted to this university or to any other institution for a degree, diploma or other qualifications.
我,Robin M. SCHMIDT,声明本论文题为“基于可解释性的图像分类领域泛化”,为完成计算机科学理学硕士学位的要求所提交,除致谢部分外,均为本人独立完成。我进一步声明,本论文未曾作为整体或部分内容提交给本校或任何其他机构用于申请学位、文凭或其他资格。

1 Introduction 1
1 引言 1
2 Domain Generalization 3
2 领域泛化 3
2.1 Problem formulation 3
2.1 问题表述 3
2.2 Related concepts and their differences 5
2.2 相关概念及其区别 5
2.2.1 Generic Neural Network Regularization 6
2.2.1 通用神经网络正则化 6
2.2.2 Domain Adaptation 6
2.2.2 领域适应 6
2.3 Previous Works 6
2.3 相关工作 6
2.3.1 Learning invariant features 7
2.3.1 学习不变特征 7
2.3.2 Model ensembling 8
2.3.2 模型集成 8
2.3.3 Meta-learning 8
2.3.3 元学习 8
2.3.4 Data Augmentation 8
2.3.4 数据增强 8
2.4 Common Datasets 9
2.4 常用数据集 9
2.4.1 Rotated MNIST 9
2.4.1 旋转MNIST 9
2.4.2 Colored MNIST 9
2.4.2 彩色MNIST 9
2.4.3 Office-Home 9
2.4.3 Office-Home 9
2.4.4 VLCS 11
2.4.4 VLCS 11
2.4.5 PACS 11
2.4.5 PACS 11
2.4.6 Terra Incognita 11
2.4.6 Terra Incognita 11
2.4.7 DomainNet 11
2.4.7 DomainNet 11
2.4.8 ImageNet-C 11
2.4.8 ImageNet-C 11
2.5 Considerations regarding model validation 12
2.5 关于模型验证的考虑 12
2.6 Deep-Dive into Representation Self-Challenging 12
2.6 表征自我挑战的深入探讨 12
3 Explainability in Deep Learning 15
3 深度学习中的可解释性 15
3.1 Related topics 16
3.1 相关主题 16
3.1.1 Model Debugging 16
3.1.1 模型调试 16
3.1.2 Fairness and Bias 16
3.1.2 公平性与偏差 16
3.2 Previous Works 17
3.2 以往工作 17
3.2.1 Visualization 17
3.2.1 可视化 17
Back-Propagation 17
反向传播(Back-Propagation) 17
Perturbation 18
扰动(Perturbation) 18
3.2.2 Model distillation 19
3.2.2 模型蒸馏 19
Local approximations 19
局部近似 19
Model Translation 19
模型翻译 19
3.2.3 Intrinsic methods 19
3.2.3 内在方法 19
Attention mechanism 19
注意力机制 19
Text explanations 20
文本解释 20
Explanation association 20
解释关联 20
Prototypes 20
原型 20
3.3 Explainability for Domain Generalization 22
3.3 域泛化的可解释性 22
4 Proposed Methods 23
4 提出的方法 23
4.1 Diversified Class Activation Maps (DIVCAM) 23
4.1 多样化类激活图(DIVCAM) 23
4.1.1 Global Average Pooling bias for small activation areas 25
4.1.1 小激活区域的全局平均池化偏差 25
4.1.2 Smoothing negative Class Activation Maps 25
4.1.2 平滑负类激活图 25
4.1.3 Conditional Domain Adversarial Neural Networks 27
4.1.3 条件域对抗神经网络 27
4.1.4 Maximum Mean Discrepancy 27
4.1.4 最大均值差异(Maximum Mean Discrepancy) 27
4.2 Prototype Networks for Domain Generalization 28
4.2 用于域泛化的原型网络 28
4.2.1 Ensemble Prototype Network 29
4.2.1 集成原型网络 29
4.2.2 Diversified Prototypes (ProDrop) 30
4.2.2 多样化原型(ProDrop) 30
4.2.3 Using Support Sets (D-TRANSFORMERS) 33
4.2.3 使用支持集(D-TRANSFORMERS) 33
5 Experiments 35
5 实验 35
5.1 Datasets and splits 35
5.1 数据集与划分 35
5.2 Hyperparameter Distributions & Schedules 35
5.2 超参数分布与调度 35
5.3 Results 35
5.3 结果 35
5.4 Ablation Studies 37
5.4 消融研究 37
5.4.1 Hyperparameter Distributions & Schedules 37
5.4.1 超参数分布与调度 37
5.4.2 DIVCAM: Mask Batching 38
5.4.2 DIVCAM:掩码批处理 38
5.4.3 DIVCAM: Class Activation Maps 39
5.4.3 DIVCAM:类别激活图 39
5.4.4 ProDrop: Self-Challenging 39
5.4.4 ProDrop:自我挑战 39
5.4.5 ProDrop: Intra-Loss 40
5.4.5 ProDrop:内部损失 40
6 Conclusion and Outlook 43
6 结论与展望 43
Bibliography 45
参考文献 45
A Domain-specific results 57
A 领域特定结果 57
B Additional distance plots 63
B 额外距离图 63
2.1 Meta-distribution D generating source and unseen domain distributions 5
2.1 生成源域和未见域分布的元分布D(Meta-distribution D) 5
3.1 Class activation maps across different architectures 18
3.1 不同架构下的类别激活图 18
3.2 Prototypes for the MNIST and Car dataset 21
3.2 MNIST和汽车数据集的原型 21
3.3 Image of a clay colored sparrow and its decomposition into prototypes 21
3.3 一只粘土色麻雀的图像及其原型分解 21
4.1 Visualization of the DIVCAM training process 24
4.1 DIVCAM训练过程的可视化 24
4.2 Used class activation maps in DIVCAM-S throughout training 26
4.2 DIVCAM-S训练过程中使用的类别激活图 26
4.3 Domain-agnostic Prototype Network 28
4.3 域无关原型网络(Domain-agnostic Prototype Network) 28
4.4 Ensemble Prototype Network 29
4.4 集成原型网络(Ensemble Prototype Network) 29
4.5 Second data split pairwise prototype distances with
4.5 第二数据划分的成对原型距离与
4.6 Second data split pairwise self-challenging prototype distances with
4.6 第二数据划分的成对自我挑战原型距离与
B. 1 First data split pairwise prototype distances with
B. 1 第一数据划分的成对原型距离与
B. 2 Third data split pairwise prototype distances with
B. 2 第三数据划分的成对原型距离与
B. 3 First data split pairwise self-challenging prototype distances with
B. 3 第一数据划分的成对自我挑战原型距离与
B. 4 Third data split pairwise self-challenging prototype distances with
B. 4 第三数据划分的成对自我挑战原型距离与
B. 5 First data split pairwise prototype distances with
B. 5 第一数据划分成对原型距离与
B. 6 Second data split pairwise prototype distances with
B. 6 第二数据划分成对原型距离与
B. 7 Third data split pairwise prototype distances with
B. 7 第三数据划分成对原型距离与
B. 8 First data split pairwise self-challenging prototype distances with
B. 8 第一数据划分成对自我挑战原型距离与
B. 9 Second data split pairwise self-challenging prototype distances with
B. 9 第二数据划分成对自我挑战原型距离与
B. 10 Third data split pairwise self-challenging prototype distances with
B. 10 第三数据划分成对自我挑战原型距离与
2.1 Differences in learning setups 5
2.1 学习设置的差异 5
2.2 Samples for two different classes across domains for popular datasets 10
2.2 流行数据集中跨域两个不同类别的样本 10
2.3 Reproduced results for Representation Self-Challenging using the official code base 14
2.3 使用官方代码库复现的表示自我挑战(Representation Self-Challenging)结果 14
5.1 Performance comparison of the proposed methods on the PACS dataset 36
5.1 提出方法在PACS数据集上的性能比较 36
5.2 Performance comparison across datasets 37
5.2 跨数据集的性能比较 37
5.3 Performance comparison for official PACS splits outside of DomAINBED 38
5.3 DomAINBED之外官方PACS划分的性能比较 38
5.4 Hyperparameters and distributions used for the mask batching ablation study 38
5.4 用于掩码批处理消融研究的超参数和分布 38
5.5 Hyperparameters and distributions used for the mask ablation study 39
5.5 用于掩码消融研究的超参数和分布 39
5.6 Ablation study for the DIVCAM mask batching on the PACS dataset 40
5.6 PACS数据集上DIVCAM掩码批处理的消融研究 40
5.7 Ablation study for the DIVCAM masks on the PACS dataset 41
5.7 PACS数据集上DIVCAM掩码的消融研究 41
5.8 Self-challenging performance comparison for different negative class weights 41
5.8 不同负类权重下自我挑战性能比较 41
5.9 Self-challenging performance comparison for different intra factors 42
5.9 不同内部因子下自我挑战性能比较 42
A. 1 Domain specific performance for the VLCS dataset 58
A. 1 VLCS数据集的领域特定性能 58
A. 2 Domain specific performance for the PACS dataset 59
A. 2 PACS数据集的领域特定性能 59
A. 3 Domain specific performance for the Office-Home dataset 60
A. 3 Office-Home数据集的领域特定性能 60
A. 4 Domain specific performance for the Terra Incognita dataset 61
A. 4 Terra Incognita数据集的领域特定性能 61
A. 5 Domain specific performance for the DomainNet dataset 62
A. 5 DomainNet数据集的领域特定性能 62
1 Spatial- and Channel-Wise RSC 14
1 空间和通道维度的RSC 14
2 Diversified Class Activation Maps (DIVCAM) 25
2 多样化类激活图(DIVCAM) 25
3 Prototype Dropping (ProDROP) 32
3 原型丢弃(ProDROP) 32
ADAM ADAptive Moment Estimation
ADAM 自适应矩估计(ADAptive Moment Estimation)
AUC Area Under the Curve
AUC 曲线下面积(Area Under the Curve)
CAM Class Activation Maps
CAM 类激活图(Class Activation Maps)
CDANN Conditional Domain Adversarial Neural Network
CDANN 条件域对抗神经网络
CE Cross-Entropy
CE 交叉熵
CNN Convolutional Neural Network
CNN 卷积神经网络
CMNIST Colored MNIST
CMNIST 彩色MNIST
DA Domain Adaptation
DA 域适应
DANN Domain Adversarial Neural Network
DANN 域对抗神经网络
DeepLIFT Deep Learning Important Features
DeepLIFT 深度学习重要特征
DivCAM Diversified Class Activation Maps
DivCAM 多样化类别激活图
DNN Deep Neural Network
DNN 深度神经网络
ERM Empirical Risk Minimization
ERM 经验风险最小化
GAN Generative Adversarial Network
GAN 生成对抗网络
Grad-CAM Gradient-weighted Class Activation Maps
Grad-CAM 梯度加权类别激活图
HNC Homogeneous Negative Class Activation Maps
HNC 同质负类激活图
I.I.D. Independent and Identically Distributed
I.I.D. 独立同分布(Independent and Identically Distributed)
LIME Local Interpretable Model-agnostic Explanations
LIME 局部可解释模型无关解释(Local Interpretable Model-agnostic Explanations)
MAML Model-Agnostic Meta-Learning
MAML 模型无关元学习(Model-Agnostic Meta-Learning)
MMD Maximum Mean Discrepancy
MMD 最大均值差异(Maximum Mean Discrepancy)
MTAE Multi-Task Autoencoder
MTAE 多任务自编码器(Multi-Task Autoencoder)
NLP Natural Language Processing
NLP 自然语言处理(Natural Language Processing)
ProDrop Prototype Dropping
ProDrop 原型丢弃(Prototype Dropping)
RKHS Reproducing Kernel Hilbert Space
RKHS 再生核希尔伯特空间(Reproducing Kernel Hilbert Space)
RMNIST Rotated MNIST
RMNIST 旋转MNIST(Rotated MNIST)
SGD Stochastic Gradient Descent
SGD 随机梯度下降(Stochastic Gradient Descent)
SVM Support Vector Machine
SVM 支持向量机(Support Vector Machine)
TAP Threshold Average Pooling
TAP 阈值平均池化(Threshold Average Pooling)
Xrandom variable for input features
X 输入特征的随机变量
Yrandom variable for output labels
Y 输出标签的随机变量
Cnumber of classes
C 类别数
Dtraining dataset
D 训练数据集
Dtraining distribution
D 训练分布
Zfeature representation
Z 特征表示
Xinput space
X 输入空间
Youtput space
Y 输出空间
Θ parameter space
Θ 参数空间
Knumber of feature maps of the last convolutional layer
K 最后一卷积层的特征图数量
Lloss term
L 损失项
三 set of source environments
三 源环境集合
① meta distribution
① 元分布(meta distribution)
Uunlabeled dataset
U 无标签数据集
Llabeled dataset
L 有标签数据集
Hreproducing kernel Hilbert space
再生核希尔伯特空间
Cchange vector after applying the mask
应用掩码后的变化向量
Uuniform probability matrix
均匀概率矩阵
d domain ground truth
d 域真实标签
Pset of source domain pairs -
P 源域对集合 -
Ω query head
Ω 查询头
q queries
q 查询
Vsupport-set values -
V支持集值
Wquery image values
W查询图像值
Bbatch size
B批量大小
γ weight decay factor
γ 权重衰减因子
Modern machine learning solutions commonly rely on supervised deep neural networks as their default approach and one of the tasks that is commonly required and implemented in practice is image classification. In its simplest form, the used networks combine multiple layers of linear transformations coupled with non-linear activation functions to classify an input image based on its pixel values into a discrete set of classes. The chosen architecture, also known as the model, is then able to learn good parameters that represent how to combine the information extracted from the individual pixel values using additional labeled information. That is, for a set of images we know the correct class and can automatically guide the network towards well-working parameters by determining how wrong the current prediction is and in which direction we need to update the individual parameters, for our network to give a more accurate class prediction.
现代机器学习解决方案通常依赖监督式深度神经网络作为默认方法,其中一个常见且实际应用的任务是图像分类。在最简单的形式中,所使用的网络结合了多层线性变换和非线性激活函数,根据输入图像的像素值将其分类到离散类别集合中。所选架构,也称为模型,能够学习良好的参数,表示如何利用额外的标注信息结合从单个像素值中提取的信息。也就是说,对于一组图像,我们知道正确的类别,并能自动引导网络朝着更优参数更新,通过确定当前预测的误差及参数更新方向,使网络给出更准确的类别预测。
Obtaining this labeled information, however, is very tedious in practice and either requires a lot of manual human labeling or sufficient human quality assurance for any automatic labeling system. A commonly used approach to overcome this impediment is to combine multiple sources of data, that might have been collected in different settings but represent the same set of classes and have already been labeled a priori. Since the training distribution described by the obtained labeled data is then often different from the testing distribution imposed by the images we observe once we deploy our system, we commonly observe a distribution shift when testing and the network generally needs to do out-of-distribution predictions. Similar behavior can be observed when the model encounters irregular properties during testing such as weather or lighting conditions which have not been captured well by the training data. For many computer vision neural network models, this poses an interesting and challenging task which is known as out-of-distribution generalization where researchers try to improve the predictions under these alternating circumstances for more robust machine learning models.
然而,获取这些标注信息在实践中非常繁琐,通常需要大量人工标注或对任何自动标注系统进行充分的人为质量保证。为克服这一障碍,常用的方法是结合多个数据源,这些数据可能在不同环境下收集,但代表相同的类别集合且已事先标注。由于所获得的标注数据描述的训练分布通常与部署系统后观察到的测试分布不同,因此测试时常会出现分布偏移,网络通常需要进行分布外预测。当模型在测试时遇到训练数据未充分覆盖的异常属性,如天气或光照条件,也会出现类似情况。对于许多计算机视觉神经网络模型而言,这构成了一个有趣且具有挑战性的任务,称为分布外泛化(out-of-distribution generalization),研究者试图在这些变化环境下提升预测能力,以实现更鲁棒的机器学习模型。
In this work, we pick up on this challenge and try to improve the out-of-distribution generalization capabilities with models and techniques that have been proposed to make deep neural networks more explainable. For most machine learning settings, gaining a degree of explainability that tries to give humans more insights into how and why the network arrives at its predictions, restricts the underlying model and hinders performance to a certain degree. For example, decision trees are thought of as being more explainable than deep neural networks but lack performance on visual tasks. In this work, we investigate if these properties also hold for the out-of-distribution generalization task or if we can deploy explainability methods during the training procedure and gain, both, better performance as well as a framework that enables more explainability for the users. In particular, we develop a regularization technique based on class activation maps that visualize parts of an image that led to certain predictions (DIvCAM) as well as prototypical representations that serve as a number of class or attribute centroids which the network uses to make its predictions (ProDrop and D-TRANSFORMERS). Specifically, we deploy these methods for the domain generalization task where the model has access to images from multiple training domains, each imposing a different distribution, but without access to images from the immediate testing distribution.
本研究针对这一挑战,尝试利用旨在提升深度神经网络可解释性的模型和技术,改进分布外泛化能力。在大多数机器学习场景中,获得一定程度的可解释性以帮助人类理解网络如何及为何做出预测,通常会限制底层模型并在一定程度上影响性能。例如,决策树被认为比深度神经网络更具可解释性,但在视觉任务上的表现较差。本研究探讨这些特性是否同样适用于分布外泛化任务,或是否可以在训练过程中部署可解释性方法,同时获得更优性能和为用户提供更多可解释性的框架。具体而言,我们开发了一种基于类激活图(class activation maps,CAM)的正则化技术(DIvCAM),该技术可视化导致特定预测的图像部分;以及基于原型表示的技术,这些原型作为类别或属性的中心点,网络利用它们进行预测(ProDrop和D-TRANSFORMERS)。我们特别将这些方法应用于领域泛化任务,在该任务中模型可访问来自多个训练域的图像,每个域具有不同分布,但无法访问直接测试分布的图像。
From our experiments, we observe that especially DIvCAM offers state-of-the-art performance on some datasets while providing a framework that enables additional insights into the training and prediction procedure. For us, this is a property that is highly desirable, especially in safety-critical scenarios such as self-driving cars, any application in the medical field such as cancer or tumor prediction, or any other automation robot that needs to operate in a diverse set of environments. Hopefully, some of the methods presented in this work can find application in such scenarios and establish additional trust and confidence into the machine learning system to work reliable. All of our experiments have been conducted within the DOMAINBED domain generalization benchmarking framework and the respective code has been open-sourced.
实验结果表明,尤其是DIvCAM在某些数据集上提供了最先进的性能,同时提供了一个能够深入理解训练和预测过程的框架。对于我们而言,这一特性尤为重要,特别是在安全关键场景中,如自动驾驶汽车、医学领域的癌症或肿瘤预测,或任何需要在多样环境中运行的自动化机器人。希望本研究中提出的一些方法能在此类场景中得到应用,增强对机器学习系统的信任和可靠性。我们所有的实验均在DOMAINBED领域泛化基准框架内进行,相关代码已开源。
Machine learning systems often lack out-of-distribution generalization which causes models to heavily rely on the training distribution and as a result don't perform very well when presented with a different input distribution during testing. Examples are application scenarios where intelligent systems don't generalize well across health centers if the training data was only collected in a single hospital
机器学习系统通常缺乏分布外泛化能力,这导致模型严重依赖训练分布,因此在测试时遇到不同输入分布时表现不佳。例子包括智能系统在健康中心之间泛化能力差的应用场景,如果训练数据仅在单一医院收集
Supervised Learning In supervised learning we are aiming to optimize the predictions
监督学习 在监督学习中,我们旨在优化对随机变量
of our model. Since we only have access to the distribution
我们的模型风险。由于我们只能通过数据集
by adding up the loss terms of each sample. One common choice for this loss term is the Cross-Entropy (CE) loss which is shown in Equation (2.3).
通过对每个样本的损失项求和。该损失项的常见选择是交叉熵(CE)损失,如公式(2.3)所示。
Here,
其中,
The occurring minimization problem is then often solved through iterative gradient-based optimization algorithms e.g. SGD [155] or ADAM [98] which perform on-par with recent methods for the non-convex, continuous loss surfaces produced by modern machine learning problems and architectures [163].
该最小化问题通常通过迭代的基于梯度的优化算法求解,例如SGD[155]或ADAM[98],这些算法在现代机器学习问题和架构产生的非凸连续损失面上表现与最新方法相当[163]。
On top of that,the model predictor
此外,模型预测器
Domain Generalization The problem of Domain generalization (DG) builds on top of this framework,where now we have a set of training environments
领域泛化(Domain Generalization,DG)问题基于此框架,现有一组训练环境
Since we don’t have access to
由于训练期间无法访问
Again, this can be written with the empirical risk as a simple sum over all the environments and their corresponding samples:
同样,这可以用经验风险表示为对所有环境及其对应样本的简单求和:
The difference of this approach when compared to ordinary supervised learning is shown on a high-level in Table 2.1. It may also be helpful to think about a meta-distribution
与普通监督学习相比,该方法的区别在表2.1中进行了高层次展示。思考一个元分布

Figure 2.1: Meta-distribution
图2.1:元分布
| Setup | Training inputs | Testing inputs |
| Generative learning | \( {U}_{{\xi }_{1}} \) | 0 |
| Unsupervised learning | \( {U}_{{\xi }_{1}} \) | \( {U}_{{\xi }_{1}} \) |
| Supervised learning | \( {L}_{{\xi }_{1}} \) | \( {U}_{{\xi }_{1}} \) |
| Semi-supervised learning | \( {L}_{{\xi }_{1}},{U}_{{\xi }_{1}} \) | \( {U}_{{\xi }_{1}} \) |
| Multitask learning | \( {L}_{{\xi }_{1}},\ldots ,{L}_{{\xi }_{s}} \) | \( {U}_{{\xi }_{1}},\ldots ,{U}_{{\xi }_{s}} \) |
| Continual (lifelong) learning | \( {L}_{{\xi }_{1}},\ldots ,{L}_{{\xi }_{\infty }} \) | \( {U}_{{\xi }_{1}},\ldots ,{U}_{{\xi }_{\infty }} \) |
| Domain Adaptation | \( {L}_{{\xi }_{1}},\ldots ,{L}_{{\xi }_{s}},{U}_{{\xi }_{t}} \) | \( {U}_{{\xi }_{t}} \) |
| Transfer learning | \( {U}_{{\xi }_{1}},\ldots ,{U}_{{\xi }_{s}},{L}_{{\xi }_{t}} \) | \( {U}_{{\xi }_{t}} \) |
| Domain generalization | \( {L}_{{\xi }_{1}},\ldots ,{L}_{{\xi }_{s}} \) | \( {U}_{{\xi }_{t}} \) |
| 设置 | 训练输入 | 测试输入 |
| 生成式学习 | \( {U}_{{\xi }_{1}} \) | 0 |
| 无监督学习 | \( {U}_{{\xi }_{1}} \) | \( {U}_{{\xi }_{1}} \) |
| 监督学习 | \( {L}_{{\xi }_{1}} \) | \( {U}_{{\xi }_{1}} \) |
| 半监督学习 | \( {L}_{{\xi }_{1}},{U}_{{\xi }_{1}} \) | \( {U}_{{\xi }_{1}} \) |
| 多任务学习 | \( {L}_{{\xi }_{1}},\ldots ,{L}_{{\xi }_{s}} \) | \( {U}_{{\xi }_{1}},\ldots ,{U}_{{\xi }_{s}} \) |
| 持续(终身)学习 | \( {L}_{{\xi }_{1}},\ldots ,{L}_{{\xi }_{\infty }} \) | \( {U}_{{\xi }_{1}},\ldots ,{U}_{{\xi }_{\infty }} \) |
| 领域适应 | \( {L}_{{\xi }_{1}},\ldots ,{L}_{{\xi }_{s}},{U}_{{\xi }_{t}} \) | \( {U}_{{\xi }_{t}} \) |
| 迁移学习 | \( {U}_{{\xi }_{1}},\ldots ,{U}_{{\xi }_{s}},{L}_{{\xi }_{t}} \) | \( {U}_{{\xi }_{t}} \) |
| 领域泛化 | \( {L}_{{\xi }_{1}},\ldots ,{L}_{{\xi }_{s}} \) | \( {U}_{{\xi }_{t}} \) |
Table 2.1: Differences in learning setups adapted from: [73]. For each environment
表2.1:学习设置的差异,改编自:[73]。对于每个环境
Homogeneous and Heterogeneous Sometimes, domain generalization is also divided into homogeneous and heterogeneous subtasks. In homogeneous DG we assume that all domains share the same label space
同质与异质 有时,领域泛化也被划分为同质和异质子任务。在同质领域泛化中,我们假设所有领域共享相同的标签空间
Single- and Multi-Source There exist subtle differences within this task which are called single- or multi-source domain generalization. While multi-source domain generalization refers to the standard-setting we have just outlined, single-source domain generalization is a more generic formulation [221]. Instead of relying on multiple training domains to learn models which generalize better, single-source domain generalization aims at learning these representations with access to only one source distribution. Hence,our training domains are restricted to
单源与多源 该任务中存在细微差别,称为单源或多源领域泛化。多源领域泛化指的是我们刚才描述的标准设置,而单源领域泛化是一种更通用的表述[221]。单源领域泛化不依赖多个训练领域来学习更具泛化能力的模型,而是旨在仅通过访问单一源分布来学习这些表示。因此,我们的训练领域限制为
As already introduced, members of the causality community might know the task of domain generalization under the term learning from multiple environments
如前所述,因果推断领域的研究者可能将领域泛化任务称为“多环境学习”
In theory, generic model regularization which aims to prevent neural networks from overfitting on the source domain could also improve the domain generalization performance [86]. As such, methods like dropout [177], early stopping [30], or weight decay [136] can have a positive effect on this task when deployed properly. Apart from regular dropout, where we randomly disable neurons in the training phase to stop them from co-adapting too much, a few alternative methods exist. These include dropping random patches of input images (Cutout &HaS) [40, 172] or channels of the feature map (SpatialDropout) [185], dropping contiguous regions of the feature maps (DropBlock) [64], dropping features of high activations across feature maps and channels (MaxDrop) [139], or generalizing the traditional dropout of single units to entire layers during training (DropPath) [104]. There even exist methods like curriculum dropout [130] that deploy scheduling for the dropout probability and therefore softly increase the number of units to be suppressed layerwise during training.
理论上,旨在防止神经网络在源领域过拟合的通用模型正则化方法,也可能提升领域泛化性能[86]。因此,诸如dropout(随机失活)[177]、早停[30]或权重衰减[136]等方法,在适当应用时对该任务有积极影响。除了常规dropout,即在训练阶段随机禁用神经元以防止其过度协同外,还有一些替代方法。这些包括随机遮挡输入图像的部分区域(Cutout和HaS)[40,172],或特征图的通道(SpatialDropout)[185],遮挡特征图的连续区域(DropBlock)[64],遮挡跨特征图和通道的高激活特征(MaxDrop)[139],或将传统的单元dropout推广到训练期间整层的dropout(DropPath)[104]。甚至存在如课程式dropout[130],通过调度dropout概率,逐层软性增加训练中被抑制的单元数量。
Generally, deploying some of these methods when aiming for out-of-distribution generalization can be a good idea and should definitely be considered for the task of domain generalization.
通常,在追求分布外泛化时部署其中一些方法是一个不错的选择,且在领域泛化任务中应当予以考虑。
Domain Adaptation (DA) is often mentioned as a closely related task in domain generalization literature
领域适应(Domain Adaptation,DA)常被认为是领域泛化文献中密切相关的任务
Earlier methods in this space deploy hand-crafted features to reduce the difference between the source and the target domains [126]. Like that, instance-based methods try to re-weight source samples according to target similarity
早期方法采用手工设计的特征来减少源域与目标域之间的差异[126]。例如,基于实例的方法尝试根据与目标的相似度重新加权源样本
Even though domain adaptation and domain generalization both try to reduce the dataset bias, they are not compatible with each other [65]. Hence, domain adaptation methods often cannot be directly used for domain generalization or vice versa [65]. For this work, we don't rely on the simplifying assumptions of domain adaptation, but instead, tackle the more challenging task of DG.
尽管领域适应和领域泛化都试图减少数据集偏差,但它们彼此并不兼容[65]。因此,领域适应方法通常不能直接用于领域泛化,反之亦然[65]。在本工作中,我们不依赖领域适应的简化假设,而是着手解决更具挑战性的领域泛化任务。
Most commonly, work in domain generalization can be divided into methods that try to learn invariant features, combine domain-specific models in a process called model ensembling, pursue meta-learning, or utilize data augmentation to generate new domains or more robust representations. Since literature in the domain generalization space is broad, we utilized Gulrajani and Lopez-Paz [73, Appendix A] for an overview and to identify relevant literature while individually adding additional works and more detailed information where necessary.
领域泛化的工作通常可分为尝试学习不变特征、通过模型集成结合领域特定模型、追求元学习或利用数据增强生成新域或更鲁棒表示的方法。鉴于领域泛化领域文献广泛,我们参考了Gulrajani和Lopez-Paz[73,附录A]以获得概览并识别相关文献,同时在必要时单独补充额外工作和更详细信息。
Methods that try to learn invariant features typically minimize the difference between source domains. They assume that with this approach the features will be domain-invariant and therefore will have good performance for unseen testing domains [86].
尝试学习不变特征的方法通常最小化源域之间的差异。他们假设通过这种方法,特征将具有领域不变性,因此在未见测试域上表现良好[86]。
Some of the earliest works on learning invariant features were kernel methods applied by Muandet, Balduzzi, and Schölkopf [132] where they experimented with a feature transformation that minimizes the across-domain dissimilarity between transformed feature distributions while preserving the functional relationship between original features and targets. In recent years, there have been approaches following a similar kernel-based approach [115, 116], sometimes while maximizing class separability
关于学习不变特征的早期工作之一是Muandet、Balduzzi和Schölkopf[132]应用的核方法,他们尝试一种特征变换,最小化变换后特征分布间的跨域差异,同时保持原始特征与目标之间的函数关系。近年来,也有类似的基于核的方法[115, 116],有时同时最大化类别可分性
After that, Ganin et al. [61] introduced Domain Adversarial Neural Networks (DANNs) using neural network architectures to learn domain-invariant feature representations by adding a gradient reversal layer. Recently, their approach got extended to support statistical dependence between domains and class labels [2] or considering one-versus-all adversaries to minimize pairwise divergences between source distributions [5]. Motiian et al. [131] use a siamese architecture to learn a feature transformation that tries to achieve semantical alignment of visual domains while maximally separating them. Other methods are also matching the feature covariance across source domains [150] or take a causal interpretation to match representations of features [122]. Huang et al. [86] have also shown that self-challenging (i.e. dropping features with high gradient values at each epoch) works very well.
随后,Ganin 等人[61]引入了领域对抗神经网络(Domain Adversarial Neural Networks, DANNs),通过在神经网络架构中添加梯度反转层来学习领域不变的特征表示。最近,他们的方法被扩展以支持领域与类别标签之间的统计依赖关系[2],或考虑一对多对抗以最小化源分布之间的成对散度[5]。Motiian 等人[131]采用孪生网络架构学习特征变换,旨在实现视觉领域的语义对齐,同时最大限度地分离它们。其他方法还包括匹配源领域间的特征协方差[150],或采用因果解释来匹配特征表示[122]。Huang 等人[86]也证明了自我挑战(即每个训练周期丢弃梯度值高的特征)效果显著。
Matsuura and Harada [127] use clustering techniques to split single-source domain generalization into different domains and then train a domain-invariant feature extractor via adversarial learning. Other works have also deployed similar approaches based on adversarial strategies [37, 92].
Matsuura 和 Harada[127]利用聚类技术将单一源领域泛化划分为不同领域,然后通过对抗学习训练领域不变的特征提取器。其他研究也采用了基于对抗策略的类似方法[37, 92]。
Li et al. [112] deploy adversarial autoencoders with maximum mean discrepancy (MMD) [72] to align the source distributions,i.e. for distributions
Li 等人[112]采用带有最大均值差异(Maximum Mean Discrepancy, MMD)[72]的对抗自编码器对源分布进行对齐,即对于分布
Ilse et al. [88] extend the variational autoencoder [99] by introducing latent representations for environments
Ilse 等人[88]通过引入环境
Some methods try to associate model parameters with each of the training domains and combine them, often together with shared parameters, in a meaningful matter to improve generalization to the test domain. Commonly, the number of models in these type of architectures grow linearly with the number of source domains.
一些方法尝试将模型参数与每个训练领域关联,并将它们(通常与共享参数一起)以有意义的方式组合,以提升对测试领域的泛化能力。通常,这类架构中的模型数量会随着源领域数量线性增长。
The first work to pose the problem of domain generalization and analyze it was Blanchard, Lee, and Scott [20]. In there,they use classifiers for each sample
最早提出领域泛化问题并进行分析的工作是Blanchard、Lee和Scott [20]。他们为每个样本使用分类器
Meta-learning approaches provide algorithms that tackle the problem of learning to learn [161, 183]. As such, Finn, Abbeel, and Levine [56] propose a Model-Agnostic Meta-Learning (MAML) algorithm that can quickly learn new tasks with fine-tuning. Li et al. [108] adapt this algorithm for domain generalization (no fine-tuning) such that we can adapt to new domains by utilizing the meta-optimization objective which ensures that steps to improve training domain performance should also improve testing domain performance. Both approaches are not bound to a specific architecture and can therefore be deployed for a wide variety of learning tasks. These approaches recently got extended by two reg-ularizers that encourage general knowledge about inter-class relationships and domain-independent class-specific cohesion [46], to heterogeneous domain generalization [117], or via meta-learning a regularizer that encourages across-domain performance [15].
元学习方法提供了解决“学习如何学习”问题的算法[161, 183]。例如,Finn、Abbeel和Levine[56]提出了一种模型无关元学习(Model-Agnostic Meta-Learning,MAML)算法,能够通过微调快速学习新任务。Li等人[108]将该算法适配于领域泛化(无微调),通过利用元优化目标,使得提升训练领域性能的步骤也能提升测试领域性能,从而实现对新领域的适应。这两种方法不依赖于特定架构,因此可应用于多种学习任务。近期,这些方法通过两个正则项得到扩展,分别鼓励关于类间关系的通用知识和领域无关的类内凝聚力[46],应用于异构领域泛化[117],或通过元学习引入鼓励跨领域性能的正则项[15]。
Data Augmentation remains a competitive method for generalizing to unseen domains [211]. Works in this segment try to extend the source environments to a wider range of domains by augmenting the available training environments. However, to deploy an efficient procedure for that, human experts need to consider the data at hand to develop a useful routine [73].
数据增强依然是泛化到未见领域的有效方法[211]。该领域的工作尝试通过扩展可用训练环境来覆盖更广泛的领域。然而,为了部署高效的增强流程,需由人工专家根据手头数据设计合适的方案[73]。
Several works have used the MIXUP [210] algorithm as a method to merge samples from different domains
多项工作采用MIXUP[210]算法作为融合不同领域样本的方法
There exist several datasets that are commonly used in domain generalization research. Here, we want to introduce the most popular choices as well as interesting datasets to consider. We give an overview over Rotated MNIST, Colored MNIST, Office-Home, VLCS, PACS, Terra Incognita, DomainNet, and ImageNet-C. Currently, the most popular choices include PACS, VLCS, and Office-Home.
在领域泛化研究中存在几个常用的数据集。这里,我们将介绍最受欢迎的选择以及值得考虑的有趣数据集。我们概述了Rotated MNIST、Colored MNIST、Office-Home、VLCS、PACS、Terra Incognita、DomainNet和ImageNet-C。目前,最受欢迎的选择包括PACS、VLCS和Office-Home。
The Rotated MNIST (RMNIST) dataset [66] is a variation of the original MNIST dataset [105] where each digit got rotated by degrees
旋转MNIST(Rotated MNIST,RMNIST)数据集[66]是原始MNIST数据集[105]的一个变体,其中每个数字都被旋转了
The Colored MNIST (CMNIST) dataset [9] is another variation of the original MNIST dataset [105]. The grayscale images of MNIST got colored in red and green. The respective label corresponds to a combination of digit and color where the color correlates to the class label with factors
彩色MNIST(Colored MNIST,CMNIST)数据集[9]是原始MNIST数据集[105]的另一种变体。MNIST的灰度图像被染成红色和绿色。相应的标签对应于数字和颜色的组合,其中颜色与类别标签相关,作为域的因素为
To construct the dataset Arjovsky et al. [9] first assign an initial label
为了构建该数据集,Arjovsky等[9]首先为数字
Overall, the dataset in Gulrajani and Lopez-Paz [73] contains 70000 images from 2 homogeneous classes (green
总体而言,Gulrajani和Lopez-Paz[73]中的数据集包含来自2个同质类别(绿色
The Office-Home dataset [190] provides 15588 images from 65 categories across 4 domains. The domains include Art, Clipart, Products (objects without a background), and Real-World (captured with a regular camera). Samples from these domains for the classes "alarm-clock" and "bed" can be seen in Table 2.2. On average, each class contains around 70 images with a maximum of 99 images in a category [190]. In Gulrajani and Lopez-Paz [73] they use dimension
Office-Home数据集[190]提供了来自4个域的65个类别共15588张图像。域包括艺术(Art)、剪贴画(Clipart)、产品(无背景物体)和现实世界(使用普通相机拍摄)。表2.2中展示了“闹钟”和“床”两个类别在这些域中的样本。平均而言,每个类别约有70张图像,单个类别最多99张[190]。Gulrajani和Lopez-Paz[73]中每张图像的尺寸为

Table 2.2: Samples for two different classes across domains for popular datasets
表2.2:流行数据集中两个不同类别跨域的样本
The VLCS dataset [52] is a dataset that utilizes photographic datasets as individual domains. As such, it contains the domains PASCAL VOC (V) [51], LabelMe (L) [158], Caltech101 (C) [111], and SUN09 (S) [33]. In total, there are 10729 images from 5 classes. Samples for the classes "bird" and "car" can be seen in Table 2.2. In Gulrajani and Lopez-Paz [73] they use the dimension
VLCS数据集[52]利用了摄影数据集作为独立域。它包含PASCAL VOC(V)[51]、LabelMe(L)[158]、Caltech101(C)[111]和SUN09(S)[33]四个域。总计有来自5个类别的10729张图像。表2.2中展示了“鸟”和“车”两个类别的样本。Gulrajani和Lopez-Paz[73]中每张图像的尺寸为
The PACS dataset [107] consists of images from different domains including Photo (P), Art (A), Cartoon (C), and Sketch (S) as individual domains. As such, it extends the previously photo-dominated data sets in domain generalization [107]. It includes seven homogeneous classes (dog, elephant, giraffe, guitar, horse, house, person) across the four previously mentioned domains. Table 2.2 shows samples from the "dog" and "elephant" classes across all domains.
PACS数据集[107]包含来自不同领域的图像,包括照片(Photo,P)、艺术(Art,A)、卡通(Cartoon,C)和素描(Sketch,S)作为独立领域。因此,它扩展了先前以照片为主的数据集在领域泛化中的应用[107]。该数据集涵盖了七个同质类别(狗、大象、长颈鹿、吉他、马、房屋、人物),分布于上述四个领域。表2.2展示了“狗”和“大象”类别在所有领域中的样本。
In total, PACS contains 9991 images which got obtained by intersecting classes from Caltech256 (Photo), Sketchy (Sketch) [160], TU-Berlin (Sketch) [49], and Google Images (all but Sketch) [107].
PACS总共包含9991张图像,这些图像通过交集Caltech256(照片)、Sketchy(素描)[160]、TU-Berlin(素描)[49]和Google Images(除素描外所有)[107]中的类别获得。
The Terra Incognita dataset is a subset of the initial Caltech camera traps dataset proposed by Beery, Horn, and Perona [17]. It contains photographs of wild animals taken by camera traps at different locations (IDs:38,43,46,100) which represent the domains. As such,the version used by Gulrajani and Lopez-Paz [73] contains 24788 images of 10 classes,each with size
Terra Incognita数据集是Beery、Horn和Perona提出的初始Caltech相机陷阱数据集的一个子集[17]。它包含由相机陷阱在不同地点(ID:38、43、46、100)拍摄的野生动物照片,这些地点代表不同领域。因此,Gulrajani和Lopez-Paz使用的版本[73]包含24788张10个类别的图像,每张图像尺寸为
The main data challenges that arise in this dataset include illumination, motion blur, size of the region of interest, occlusion, camouflage, and perspective [17]. This includes animals not always being salient or them being small or far from the camera which results in only partial views of the animals' body being available [17].
该数据集面临的主要挑战包括光照、运动模糊、感兴趣区域大小、遮挡、伪装和视角[17]。这包括动物并非总是显著,或者它们体积较小或距离相机较远,导致只能获得动物身体的部分视图[17]。
The DomainNet dataset [141] contains six domains: clipart (48129 images), infographic (51605 images), painting (72266 images), quickdraw (172500 images), real (172947 images), and sketch (69128 images) for 345 classes. In total, it contains 586575 images that got accumulated by searching a category with a domain name in multiple image search engines and, as an exception, players of the game "Quick Draw!" for the quickdraw domain [141].
DomainNet数据集[141]包含六个领域:剪贴画(clipart,48129张图像)、信息图(infographic,51605张图像)、绘画(painting,72266张图像)、快速绘图(quickdraw,172500张图像)、真实图像(real,172947张图像)和素描(sketch,69128张图像),涵盖345个类别。总计包含586575张图像,这些图像通过在多个图像搜索引擎中以类别名加领域名搜索获得,快速绘图领域的图像则来自游戏“Quick Draw!”的玩家[141]。
To secure the quality of the dataset, they hired 20 annotators for a total of 2500 hours to filter out falsely labeled images [141]. Each category has an average of 150 images for the domains clipart and infographic, 220 images for painting and sketch, and 510 for the real domain [141].
为保证数据集质量,他们雇佣了20名标注员,累计工作2500小时,以筛除错误标注的图像[141]。每个类别在剪贴画和信息图领域平均有150张图像,绘画和素描领域平均220张,真实领域平均510张[141]。
The ImageNet-C dataset [80] contains images out of ImageNet [157] permutated according to 15 corruption types each with 5 levels of severity which results in 75 domains. The types of corruptions are out of the categories "noise", "blur", "weather", and "digital" [80]. Table 2.2 shows a few of the available corruptions at severity 3 for the same image sample of two different classes. As their corruptions, they provide Gaussian Noise, Shot Noise, Impulse Noise, Defocus Blur, Frosted Glass Blur, Motion Blur, Zoom Blur, Snow, Frost, Fog, Brightness, Contrast, Elastic, and Pixelate [80]. Overall, the dataset provides all 1000 ImageNet classes where each image has the standard dimension of
ImageNet-C数据集[80]包含来自ImageNet[157]的图像,这些图像根据15种腐蚀类型进行变换,每种腐蚀有5个严重程度等级,共形成75个领域。腐蚀类型分为“噪声”、“模糊”、“天气”和“数字”类别[80]。表2.2展示了两个不同类别同一图像样本在严重程度3下的部分腐蚀示例。腐蚀类型包括高斯噪声、散弹噪声、脉冲噪声、散焦模糊、磨砂玻璃模糊、运动模糊、缩放模糊、降雪、霜冻、雾、亮度、对比度、弹性变换和像素化[80]。总体而言,该数据集涵盖了ImageNet的全部1000个类别,每张图像尺寸标准为
In traditional supervised learning setups, we train our model on the training dataset, validate its hyperparameters (e.g. number of layers or hidden units in a neural network) on a separate validation dataset, and finally evaluate our model on an unused test dataset. Notice, that the validation dataset should be distributed identically to the test data to properly fit the hyperparameters of our architecture. This is not a straightforward process for domain generalization, as we lack a proper validation dataset with the needed statistical properties. There exist several approaches to this problem, some of them being more grounded than others. Here we give an overview of the three approaches outlined by Gulrajani and Lopez-Paz [73] that are respectively used in their DG framework DomAINBED.
在传统的监督学习设置中,我们在训练数据集上训练模型,在单独的验证数据集上调整超参数(例如神经网络的层数或隐藏单元数),最后在未使用的测试数据集上评估模型。需要注意的是,验证数据集应与测试数据分布一致,以便正确调整架构的超参数。对于领域泛化而言,这一过程并不简单,因为缺乏具有所需统计特性的合适验证数据集。对此问题存在多种解决方案,其中一些更为合理。这里我们概述Gulrajani和Lopez-Paz[73]提出的三种方法,这些方法分别被用于他们的领域泛化框架DomAINBED中。
Training-domain validation set In this approach, each training domain gets further split into training and validation subsets where all validation subsets across domains get pooled into one global validation set. We can then maximize the model's performance on that global validation set to set the hyperparameters. This approach assumes the similarity of training and test distributions.
训练领域验证集 在此方法中,每个训练领域进一步划分为训练子集和验证子集,所有领域的验证子集汇总成一个全局验证集。然后,我们可以通过最大化模型在该全局验证集上的性能来设置超参数。该方法假设训练和测试分布相似。
Leave-one-domain-out cross-validation We can train
留一域交叉验证 我们可以基于
Test-domain validation set (oracle) A rather statistically biased way of validating the model's hyperparameters is incorporating the test dataset as a validation dataset. Because of this, it is considered bad style and should be avoided or at least explicitly marked. However, one method that is possible is to restrict the test dataset access as done by Gulrajani and Lopez-Paz [73] where they prohibit early stopping and only use the last checkpoint.
测试域验证集(oracle) 一种相当统计偏倚的模型超参数验证方法是将测试数据集作为验证数据集纳入。正因如此,这种做法被认为是不规范的,应避免或至少明确标注。然而,一种可行的方法是限制测试数据集的访问,如Gulrajani和Lopez-Paz [73]所做,他们禁止提前停止训练,仅使用最后的检查点。
Other works have also come up with alternative methods to choose the hyperparameters. For example, Krueger et al. [101] validate the hyperparameters on all domains of the VLCS dataset and then apply the settings to PACS while D'Innocente and Caputo [35] use a validation technique that combines probabilities specific to their method.
其他研究也提出了选择超参数的替代方法。例如,Krueger等人[101]在VLCS数据集的所有域上验证超参数,然后将设置应用于PACS,而D'Innocente和Caputo[35]则使用结合其方法特定概率的验证技术。
Since some of our proposed methods use ideas from Representation Self-Challenging (RSC) [86], we explain their approach more in detail here. They deploy two RSC variants called Spatial-Wise RSC and Channel-Wise RSC which they randomly alternate between. Generally, these are shown in Algorithm 1 and operate on features after the last convolutional layer.
鉴于我们提出的一些方法借鉴了Representation Self-Challenging(RSC)[86]的思想,现详细说明其方法。他们采用了两种RSC变体,称为空间维度RSC(Spatial-Wise RSC)和通道维度RSC(Channel-Wise RSC),并随机交替使用。总体上,这些方法如算法1所示,作用于最后卷积层之后的特征。
First, RSC calculates the gradient of the upper layer with respect to the latent feature representation according to Equation (2.8). Here,
首先,RSC根据公式(2.8)计算上层相对于潜在特征表示的梯度。这里,
the ground truth.
真实标签的独热编码。
Afterward,they average-pool the gradients to obtain
随后,他们对梯度进行平均池化以获得
Depending on which dimensions are missing to get back to the original size of
根据缺失的维度以恢复到原始
Next,Huang et al. [86] compute the(100 - p)th percentile with the threshold value as
接着,Huang等人[86]计算第(100 - p)百分位数,阈值为
Huang et al. [86] apply the computed mask on the feature representation to yield
Huang等人[86]将计算得到的掩码应用于特征表示,得到
A positive value represents that the masking for that sample made the classifier less certain about the correct class while a negative value represents the opposite and made the classifier more certain about the correct class. Similar to previously,Huang et al. [86] calculate Top-
正值表示该样本的掩码使分类器对正确类别的置信度降低,负值则表示相反,使分类器对正确类别的置信度提高。与之前类似,Huang 等人[86]计算正值的前
Finally,we mask the features with the obtained final mask to obtain
最后,我们用得到的最终掩码对特征进行掩码处理以获得
Problems Interestingly, since many architectures like ResNet-18/ResNet-50 deploy average pooling in their forward pass after the last convolutional layer, naïve Spatial-Wise RSC doesn't make sense since average pooling is done along the channel dimension and such architectures additionally average pool on the spatial dimension. This results in feature values getting spread evenly across the image regardless of the masking. Even though this isn't mentioned in their paper, they address this issue in their official repository and propose an alternative computation. For that, they calculate the mean
问题有趣的是,由于许多架构如ResNet-18/ResNet-50在最后一个卷积层后采用了平均池化,简单的空间维度RSC没有意义,因为平均池化是在通道维度上进行的,这些架构还在空间维度上进行平均池化,导致特征值在图像上均匀分布,与掩码无关。虽然论文中未提及此问题,但他们在官方代码库中对此进行了处理并提出了替代计算方法。为此,他们计算了均值
Algorithm 1: Spatial- and Channel-Wise RSC
算法1:空间和通道维度的RSC
Input: Data
输入:数据
while epoch
当训练轮数为
for every sample (or batch) \( \mathbf{x},\mathbf{y} \) do
对每个样本(或批次)\( \mathbf{x},\mathbf{y} \)执行
Extract features \( \mathbf{z} = \phi \left( \mathbf{x}\right) \) // \( \mathbf{z} \) has shape \( {\mathbb{R}}^{{H}_{\mathbf{z}} \times {W}_{\mathbf{z}} \times K} \)
提取特征\( \mathbf{z} = \phi \left( \mathbf{x}\right) \) // \( \mathbf{z} \)的形状为\( {\mathbb{R}}^{{H}_{\mathbf{z}} \times {W}_{\mathbf{z}} \times K} \)
Compute gradient \( {\mathbf{g}}_{\mathbf{z}} \) w.r.t features according to Equation (2.8)
根据公式(2.8)计算特征的梯度\( {\mathbf{g}}_{\mathbf{z}} \)
Compute \( {\widetilde{\mathbf{g}}}_{\mathbf{z},i,j} \) by avg. pooling using \( {50}\% \) Equation (2.9) or \( {50}\% \) Equation (2.10)
使用公式(2.9)或公式(2.10)通过平均池化计算\( {\widetilde{\mathbf{g}}}_{\mathbf{z},i,j} \)
Duplicate \( {\widetilde{\mathbf{g}}}_{\mathbf{z}} \) along channel/spatial dimension for initial shape
沿通道/空间维度复制\( {\widetilde{\mathbf{g}}}_{\mathbf{z}} \)以恢复初始形状
Compute mask \( {\mathbf{m}}_{i,j} \) according to Equation (2.11)
根据公式(2.11)计算掩码\( {\mathbf{m}}_{i,j} \)
Mask features to obtain \( {\widetilde{\mathbf{z}}}_{p} = \mathbf{m} \odot \mathbf{z}\; \) // Evaluate effect of preliminary mask \( \downarrow \)
掩码特征以获得\( {\widetilde{\mathbf{z}}}_{p} = \mathbf{m} \odot \mathbf{z}\; \) // 评估初步掩码\( \downarrow \)的效果
Compute change \( \mathbf{c} \) according to Equation (2.12)
根据公式(2.12)计算变化\( \mathbf{c} \)
Revert masking for specific samples according Equation (2.13)
根据公式(2.13)对特定样本还原掩码
Mask features \( \widetilde{\mathbf{z}} = \mathbf{m} \odot \mathbf{z} \)
掩码特征\( \widetilde{\mathbf{z}} = \mathbf{m} \odot \mathbf{z} \)
Compute loss \( \mathcal{L}\left( {w\left( \widetilde{\mathbf{z}}\right) ,\mathbf{y}}\right) \) and backpropagate to whole network
计算损失\( \mathcal{L}\left( {w\left( \widetilde{\mathbf{z}}\right) ,\mathbf{y}}\right) \)并对整个网络进行反向传播
end
结束
end
结束
根据公式(2.9)
Our Results In an effort to provide somewhat fair results which aren't too optimistic and neither too penalizing, we run the original RSC code five times for each of the testing environments and compute the average performance in Table 2.3.
我们的结果 为了提供较为公平的结果,既不过于乐观也不过于苛刻,我们对每个测试环境运行原始RSC代码五次,并计算表2.3中的平均性能。
| Run | \( \mathbf{P} \) | A | C | S |
| 1 | 93.23 | 81.69 | 78.11 | 81.14 |
| 2 | 93.41 | 79.44 | 77.38 | 80.55 |
| 3 | 94.37 | 80.08 | 76.58 | 79.18 |
| 4 | 93.71 | 81.49 | 78.84 | 81.90 |
| 5 | 93.95 | 79.39 | 76.75 | 81.19 |
| Average | 93.73 | 80.41 | 77.53 | 80.79 |
| Reported | 95.99 | 83.43 | 80.31 | 80.85 |
| 运行 | \( \mathbf{P} \) | A | C | S |
| 1 | 93.23 | 81.69 | 78.11 | 81.14 |
| 2 | 93.41 | 79.44 | 77.38 | 80.55 |
| 3 | 94.37 | 80.08 | 76.58 | 79.18 |
| 4 | 93.71 | 81.49 | 78.84 | 81.90 |
| 5 | 93.95 | 79.39 | 76.75 | 81.19 |
| 平均 | 93.73 | 80.41 | 77.53 | 80.79 |
| 报告的 | 95.99 | 83.43 | 80.31 | 80.85 |
Table 2.3: Reproduced results for Representation Self-Challenging using the official code base on the PACS dataset and with a ResNet-18 backbone.
表2.3:使用官方代码库在PACS数据集和ResNet-18骨干网络上复现Representation Self-Challenging的结果。
Given these observations, we follow other works such as Nuriel, Benaim, and Wolf [137] and report our reproduced results whenever comparing to RSC.
鉴于这些观察结果,我们遵循Nuriel、Benaim和Wolf [137]等其他工作,在与RSC比较时报告我们的复现结果。
Machine Learning systems and especially deep neural networks have the characteristic that they are often seen as "black-boxes" i.e. they are hard to interpret and pinpointing as a user how and why they converge to their prediction is often very difficult, if not impossible. Neural networks commonly lack transparency for human understanding [180]. This property becomes a prominent impediment for intelligent systems deployed in impactful sectors like, for example, employment [24, 149, 216], jurisdiction [74], healthcare [140], or banking loans where users would like to know the deciding factors for decisions. As such, we would like systems that are easily interpretable, relatable to the user, provide contextual information about the choice, and reflect the intermediate thinking of the user for a decision [199]. Since these properties are very broad, it is not surprising that researchers in this field have very different approaches. For this chapter, we used the field guide by Xie et al. [199] to paint an appropriate overview and properly introduce the different approaches. Commonly, methods try to provide better solutions with respect to Confidence, Trust, Safety, and Ethics to improve the overall explainability of the model [199]:
机器学习系统,尤其是深度神经网络,通常被视为“黑箱”,即难以解释,用户很难甚至不可能明确其预测的原因和过程。神经网络通常缺乏人类可理解的透明性[180]。这一特性成为智能系统在诸如就业[24, 149, 216]、司法[74]、医疗[140]或银行贷款等关键领域应用的显著障碍,用户希望了解决策的决定因素。因此,我们希望系统易于解释、与用户相关联、提供关于选择的上下文信息,并反映用户的中间思考过程[199]。由于这些属性非常广泛,研究者在该领域采用了多种不同的方法。本章采用Xie等人[199]的领域指南,概述并恰当介绍不同的方法。通常,这些方法试图在置信度、信任、安全性和伦理方面提供更好的解决方案,以提升模型的整体可解释性[199]:
Confidence The confidence of a machine learning system is high when the "reasoning" behind a decision between the model and the user matches often. For example, saliency attention maps [87, 138] ensure that semantically relevant parts of an image get considered and therefore increase confidence.
置信度 当机器学习系统的“推理”与用户的决策理由高度一致时,置信度较高。例如,显著性注意力图[87, 138]确保图像中语义相关部分被考虑,从而提升置信度。
Trust Trust is established when the decision of an intelligent system doesn't need to be validated anymore. Recently, many works have studied the problem of whether a model can safely be adopted
信任 当智能系统的决策无需再被验证时,信任得以建立。近期许多研究关注模型是否可以安全采用
Safety Safety needs to be high for machine learning systems that have an impact on people's lives in any form. As such, the model should perform consistently as expected, prevent choices that may hurt the user or society, have high reliability under all operating conditions, and provide feedback on how the operating conditions influence the behavior.
安全性 对于对人们生活有影响的机器学习系统,安全性必须高。模型应表现稳定可靠,防止可能伤害用户或社会的选择,在所有操作条件下具有高可靠性,并反馈操作条件如何影响行为。
Ethics The ethics are defined differently depending on the moral principles of each user. In general, though, one can create an "ethics code" on which a system's decisions are based off [199]. Any sensitive characteristic e.g. religion, gender, disability, or sexual orientation are features that should be handled with great care. Similarly, we try to reduce the effect of any features that serve as a proxy for any type of discrimination process e.g. living in a specific part of a city, such as New York City's Chinatown, can be a proxy for the ethical background or income. Since this chapter gives a high-level overview of recent advances in explainability for neural networks, also concerning domain generalization, it is up to the reader's background if this is necessary.
伦理 伦理因用户的道德原则而异。一般而言,可以制定“伦理准则”作为系统决策的依据[199]。任何敏感特征,如宗教、性别、残疾或性取向,都应谨慎处理。同样,我们努力减少任何作为歧视代理的特征影响,例如居住在纽约市华埠等特定区域可能成为伦理背景或收入的代理。鉴于本章提供了神经网络可解释性及领域泛化的高层次概述,是否深入取决于读者背景。
There exist several concepts which are related to explainable deep learning. Here, we explicitly cover model debugging which tries to identify aspects that hinder training inference, and fairness and bias which especially tackles the ethics trait to search for differences in regular and irregular activation patterns to promote robust and trustworthy systems [199].
存在若干与可解释深度学习相关的概念。这里我们特别涵盖模型调试,旨在识别阻碍训练推理的因素,以及公平性和偏见,特别关注伦理特征,寻找常规与非常规激活模式的差异,以促进稳健可信的系统[199]。
Model debugging, similar to traditional software debugging, tries to pinpoint aspects of the architecture, data-processing, or training process which cause errors [199]. It aims at giving more insights into the model, allowing easier solving of faulty behavior. While such approaches help to open the black-box of neural network architectures, we handle them distinctly from the other literature here.
模型调试类似于传统软件调试,旨在定位架构、数据处理或训练过程中导致错误的因素[199]。其目标是深入了解模型,便于解决故障行为。虽然此类方法有助于揭开神经网络架构的黑箱,但我们在此将其与其他文献区分开来处理。
Amershi et al. [7] propose MODELTRACKER which is a debugging framework and interactive visualization that displays traditional statistics like Area Under the Curve (AUC) or confusion matrices. It also shows how close samples are in the feature space and allows users to expand the visualization to show the raw data or annotate them. Alain and Bengio [3] deploy linear classifiers to predict the information content in intermediate layers where the features of every layer serve as input to a separate classifier. They show that using features from deeper layers improves prediction accuracy and that level of linear separability increases monotonically. Fuchs et al. [59] introduce neural stethoscopes as a framework for analyzing factors of influence and interactively promoting and suppressing information. They extend the ordinary DNN architecture via a parallel two-layer perceptron at different locations where the input are the features from any layer from the main architecture. This stethoscope is then trained on a supplemental task and the loss is back-propagated to the main model with weighting factor
Amershi 等人[7]提出了MODELTRACKER,这是一种调试框架和交互式可视化工具,展示了传统统计指标如曲线下面积(AUC)或混淆矩阵。它还显示样本在特征空间中的接近程度,并允许用户展开可视化以显示原始数据或对其进行注释。Alain 和 Bengio [3] 部署线性分类器来预测中间层的信息内容,其中每层的特征作为单独分类器的输入。他们表明,使用更深层的特征可以提高预测准确性,且线性可分性水平单调增加。Fuchs 等人[59]引入神经听诊器(neural stethoscopes)作为分析影响因素并交互式促进或抑制信息的框架。他们通过在不同位置添加一个并行的两层感知机扩展了普通深度神经网络(DNN)架构,输入为主架构任意层的特征。该听诊器随后在辅助任务上训练,损失通过加权因子
To secure model fairness, there exist several definitions which have emerged in the literature in recent years. Group fairness [25], also known as demographic parity or statistical parity, aims at equalizing benefits across groups with respect to protected characteristics (e.g. religion, gender, etc.). By definition,if group
为了保障模型公平性,近年来文献中出现了多种定义。群体公平(group fairness)[25],也称为人口统计平等或统计平等,旨在使不同群体在受保护特征(如宗教、性别等)方面获得的利益均等。按定义,如果群体
Methods that try to ensure fairness in machine learning systems can be classified into three approaches which operate during different steps called pre-processing, in-processing, post-processing:
旨在确保机器学习系统公平性的方法可分为三类,分别在预处理(pre-processing)、处理中(in-processing)和后处理(post-processing)阶段操作:
Generally, we can divide methods for explainable deep neural networks in visualization, model distillation, and intrinsic methods [199]. While visualization methods try to highlight features that strongly correlate with the output of the DNN, model distillation builds upon a jointly trained "white-box" model, following the input-output behavior of the original architecture and aiming to identify its decision rules [199]. Finally, intrinsic methods are networks designed to explain their output, hence they aim to optimize both, its performance and the respective explanations [199].
一般而言,可解释深度神经网络的方法可分为可视化、模型蒸馏和内在方法[199]。可视化方法试图突出与DNN输出高度相关的特征,模型蒸馏则基于联合训练的“白盒”模型,模仿原始架构的输入输出行为,旨在识别其决策规则[199]。最后,内在方法是设计用以解释其输出的网络,因此它们旨在同时优化性能和相应的解释[199]。
Commonly, visualization methods use saliency maps to display the saliency values of the features i.e. to which degree the features influence the model's prediction [199]. We can further divide visualization methods into back-propagation and perturbation-based approaches where they respectively determine these values based on the volume of the gradient or between modified versions of the input [199].
常见的可视化方法使用显著性图(saliency maps)展示特征的显著性值,即特征对模型预测的影响程度[199]。我们可以进一步将可视化方法分为基于反向传播和基于扰动的方法,分别根据梯度大小或输入的修改版本之间的差异确定这些值[199]。
These approaches stick to the gradient passed through the network to determine the relevant features. As a simplistic baseline, one can display the partial derivative with respect to each input feature multiplied by its value [199]. This way, Simonyan, Vedaldi, and Zisserman [170] and Springenberg et al. [175] assess the sensitivity of the model for input changes [199]. This can also be done for collections of intermediate layers
这些方法沿用网络中传递的梯度来确定相关特征。作为简单基线,可以显示相对于每个输入特征的偏导数乘以其值[199]。通过这种方式,Simonyan、Vedaldi 和 Zisserman [170]以及 Springenberg 等人[175]评估了模型对输入变化的敏感性[199]。这也可以应用于中间层集合
Zhou et al. [217] introduce class activation maps (CAMs) which are shown in Figure 3.1 based on global average pooling (GAP) [118]. With GAP, they deploy the following CNN structure at the end of the network: GAP (Convs)
Zhou 等人[217]引入了基于全局平均池化(GAP)[118]的类激活映射(CAMs),如图3.1所示。利用GAP,他们在网络末端部署了以下卷积神经网络结构:GAP(卷积层)
By upsampling the map to the image size, they can visualize the image regions responsible for a certain class [217]. Therefore, every class has its own class activation map. The drawback of their approach is,that their method can only be applied to networks that use the GAP (Convs)
通过将映射上采样到图像大小,他们可以可视化对某一类别负责的图像区域[217]。因此,每个类别都有其对应的类激活映射。他们方法的缺点是,该方法仅适用于在末端使用GAP(卷积层)

Figure 3.1: Class activation maps across different architectures: [217]
图3.1:不同架构下的类激活映射:[217]
Selvaraju et al. [164] solve this impediment by generalizing CAMs to gradient-weighted class activation maps (Grad-CAMs). Since their approach only requires the final activation function to be differentiable,they are generally applicable to a broader range of CNN architectures [164, 199]. For that,they compute an importance score
Selvaraju 等人[164]通过将CAMs推广为梯度加权类激活映射(Grad-CAMs)解决了这一限制。由于他们的方法仅要求最终激活函数可微,因此普遍适用于更广泛的卷积神经网络架构[164, 199]。为此,他们计算了一个重要性分数
Here,
这里,
This computation inherently yields a
该计算本质上产生了一个
Apart from CAMs, there also exist other methods like layer-wise relevance propagation [11, 41, 103, 129], deep learning important features (DeepLIFT) [168], or integrated gradients [181] which are not described in detail here. Please refer to Xie et al. [199] or the original works for more information.
除了CAMs,还有其他方法如层级相关传播(layer-wise relevance propagation)[11, 41, 103, 129]、深度学习重要特征(DeepLIFT)[168]或积分梯度(integrated gradients)[181],此处不作详细描述。更多信息请参见Xie 等人[199]或相关原始文献。
Perturbation methods alternate the input features to compute their respective relevance for the model's output by comparing the differences between the original and permutated version.
干扰法通过改变输入特征,比较原始与扰动版本的差异,计算各特征对模型输出的相关性。
Zeiler and Fergus [206] sweep a gray patch over the image to determine how the model will react to occluded areas. Once, an area with a high correlation to the output is covered, the prediction performance drops [199, 206]. Li, Monroe, and Jurafsky [113] deploy a similar idea for NLP tasks where they erase words and measure the influence on the model's performance. Fong and Vedaldi [57] define three perturbations i) replacing patches with a constant value, ii) adding noise to a region, and iii) blurring the area [57, 199]. Zintgraf et al. [220] propose a method based on Robnik-Sikonja and Kononenko [156] where they calculate the relevance of a feature for class
Zeiler 和 Fergus [206]通过在图像上滑动灰色遮挡块,观察模型对遮挡区域的反应。一旦遮挡区域与输出高度相关,预测性能便下降[199, 206]。Li、Monroe 和 Jurafsky [113]在自然语言处理任务中采用类似思路,删除词语并测量对模型性能的影响。Fong 和 Vedaldi [57]定义了三种干扰方式:i)用常数值替换图像块,ii)向区域添加噪声,iii)模糊该区域[57, 199]。Zintgraf 等人[220]基于Robnik-Sikonja 和 Kononenko [156]的方法,计算特征对类别
Model distillation methods allow for post-training explanations where we learn a distilled model which imitates the original model's decisions on the same data [199]. It has access to information from the initial model and can therefore give insights about the features and output correlations [199]. Generally, we can divide these methods into local approximation and model translation approaches. These either replicate the model behavior on a small subset of the input data based on the idea that the mechanisms a network uses to discriminate in a local area of the data manifold is simpler (local approximation) or stick to using the entire dataset with a smaller model (model translation) [199].
模型蒸馏方法允许在训练后进行解释,我们通过学习一个蒸馏模型来模仿原始模型在相同数据上的决策[199]。该模型可以访问初始模型的信息,因此能够提供关于特征和输出相关性的见解[199]。通常,我们可以将这些方法分为局部近似和模型转换两类。这些方法要么基于这样一个理念:网络在数据流形的局部区域内进行判别的机制更简单,从而在输入数据的小子集上复制模型行为(局部近似);要么使用整个数据集配合更小的模型进行学习(模型转换)[199]。
Even though it may seem unintuitive to pursue approaches that don't explain every decision made by the DNN, practitioners often want to interpret decisions made for a specific data subset e.g. employee performance indicators for those fired with poor performance [199]. One of the most popular local approximations is the method proposed by Ribeiro, Singh, and Guestrin [153] called local interpretable model-agnostic explanations (LIME). They propose a notation where from an unexplainable global model
尽管追求不解释深度神经网络(DNN)所有决策的做法看似不合直觉,实践者往往希望解释特定数据子集的决策,例如针对因绩效不佳被解雇员工的绩效指标[199]。最流行的局部近似方法之一是Ribeiro、Singh和Guestrin提出的局部可解释模型无关解释(LIME)[153]。他们提出一种符号表示,从一个不可解释的全局模型
The idea of model translation is to mimic the behavior of the original deep neural network on the whole dataset, contrary to local approximations which only use a smaller subset. Some works have tried to distill neural networks into decision trees
模型转换的思想是模仿原始深度神经网络在整个数据集上的行为,这与仅使用较小子集的局部近似方法相反。一些工作尝试将神经网络蒸馏为决策树
Finally, intrinsic methods jointly output an explanation in combination with their prediction. In an ideal world, such methods would be on par with state-of-the-art models without explainability. This approach introduces an additional task that gets jointly trained with the original task of the model [199]. The additional task usually tries to provide either text explanations [26, 79, 81, 207], an explanation association
最后,内在方法在输出预测的同时联合输出解释。在理想情况下,此类方法的性能应与无解释性的最先进模型相当。这种方法引入了一个与模型原始任务联合训练的附加任务[199]。该附加任务通常尝试提供文本解释[26, 79, 81, 207]、解释关联
The attention mechanism [189] takes motivation from the human visual focus and peripheral perception [162]. With that, humans can focus on certain regions to achieve high resolution while adjacent objects are perceived with a rather low resolution [162]. In the attention mechanism, we learn a conditional distribution over given inputs using weighted contextual alignment scores (attention weights) [199]. These allow for insights on how strongly different input features are considered during model inference [199]. The alignment scores can be computed differently, for example, either content-based [71], additive [13], based on the matrix dot-product [121], or as a scaled version of the matrix dot-product [189]. Especially due to the transformer architecture [189], attention has shown to improve the neural network performance originally in natural language processing [23, 39, 102], but also more recently in image classification and other computer vision tasks
注意力机制[189]的灵感来源于人类的视觉焦点和周边感知[162]。通过这种机制,人类能够聚焦于特定区域以实现高分辨率的感知,而相邻物体则以较低分辨率被感知[162]。在注意力机制中,我们通过加权的上下文对齐分数(注意力权重)[199]学习给定输入的条件分布。这些权重揭示了在模型推理过程中不同输入特征被考虑的强度[199]。对齐分数的计算方式多样,例如基于内容[71]、加法[13]、矩阵点积[121]或矩阵点积的缩放版本[189]。尤其是由于Transformer架构[189]的出现,注意力机制最初在自然语言处理领域[23, 39, 102]显著提升了神经网络性能,近年来也被应用于图像分类及其他计算机视觉任务
Text explanations are natural language outputs that explain the model decision using a form like "This image is of class
文本解释是以自然语言形式输出的解释,用于说明模型决策,通常采用“该图像属于
Latent features or input elements that are combined with human-understandable concepts are classified under explanation associations. Such explanations either combine input or latent features with semantic concepts, associate the model prediction with a set of input elements, or utilize object saliency maps to visualize relevant image parts [199].
将潜在特征或输入元素与人类可理解的概念结合的解释方法归类为解释关联。这类解释要么将输入或潜在特征与语义概念结合,要么将模型预测与一组输入元素关联,或利用对象显著性图来可视化相关图像部分[199]。
Finally,model prototype approaches are specifically designed for classification tasks [18, 97, 147, 198]. The term prototype in few- and zero-shot learning settings are points in the feature space representing a single class [114]. In such methods, the distance to the prototype determines how an observation is classified. The prototypes are not limited to a single observation but can also be obtained using a combination of observations or latent representations [199]. A criticism, on the other hand, is a data instance that is not well represented by the set of prototypes [128]. To obtain explainability using prototypes, one can trace the reasoning path for the prediction back to the learned prototypes [199]. Li et al. [114] use a prototype layer to deploy an explainable image classifier. They propose an architecture with an autoencoder and a prototype classifier. This prototype classifier calculates the
最后,模型原型方法专门针对分类任务设计[18, 97, 147, 198]。在少样本和零样本学习设置中,原型(prototype)指的是特征空间中代表单一类别的点[114]。在此类方法中,观测样本与原型的距离决定其分类结果。原型不仅限于单个观测样本,也可通过多个观测样本或潜在表示的组合获得[199]。批评意见则指出存在某些数据实例无法被原型集合良好代表[128]。通过原型实现可解释性,可以追溯预测的推理路径至学习到的原型[199]。Li等人[114]使用原型层部署了可解释的图像分类器。他们提出了包含自编码器和原型分类器的架构。该原型分类器计算编码输入与各原型之间的

Figure 3.2: Prototypes for the MNIST (left) and Car (right) dataset: [114]
图3.2:MNIST(左)和Car(右)数据集的原型:[114]
Chen et al. [32] introduce a prototypical part network (ProtoPNet) which has similar components to Li et al. [114], namely a convolutional neural network projecting onto a latent space and a prototype classifier. The approach chosen by Chen et al. [32] is different as the prototypes are more fine-grained and represent parts of the input image [199]. Hence, their model associates image patches with prototypes for explanations [199]. Figure 3.3 illustrates this approach for bird species classification.
Chen等人[32]提出了原型部件网络(ProtoPNet),其组件与Li等人[114]类似,即卷积神经网络映射到潜在空间和原型分类器。Chen等人[32]的方法不同之处在于原型更为细粒度,代表输入图像的部分[199]。因此,他们的模型将图像块与原型关联以实现解释[199]。图3.3展示了该方法在鸟类物种分类中的应用。

Figure 3.3: Image of a clay colored sparrow and its decomposition into prototypes: [32]
图3.3:泥色麻雀图像及其分解为原型:[32]
As a general framework,we would like to learn
作为一个通用框架,我们希望学习
Chen et al. [32] compute the maximum similarity score for each prototype unit
Chen等人[32]通过以下方式计算每个原型单元
where the squared
其中使用了平方
To enforce that every class
为了确保每个类别
where the cluster loss
其中聚类损失
Note, that this is only the first part of a multi-step training procedure where Equation (3.5) solely optimizes the parameters of the featurizer
注意,这只是一个多步骤训练过程的第一部分,其中方程(3.5)仅优化特征提取器(featurizer)
While Li et al. [114] need a decoder for visualizing the prototypes, Chen et al. [32] don't require this component since they visualize the closest latent image patch across the full training dataset instead of directly visualizing the prototype [32]. They also show that when combining several of their networks into a larger network, their method is on par with best-performing deep models [32].
Li等人[114]需要解码器来可视化原型,而Chen等人[32]则不需要该组件,因为他们通过在整个训练数据集中可视化最接近的潜在图像块来代替直接可视化原型[32]。他们还展示了将多个网络组合成更大网络时,该方法的性能与表现最佳的深度模型相当[32]。
To the best of our knowledge, the only work using explanations for domain generalization is by Zunino et al. [221]. They introduce a saliency-based approach utilizing a 2D binary map of pixel locations for the ground-truth object segmentation as input. This map contains a 1 in a pixel location where the class label object is present and 0 otherwise. Even though they were able to show that their method better focuses on relevant regions, we identify the additional annotations of class label objects as a major drawback which we solve in this work. Indeed, we not only avoid the use of annotations by directly restoring to Grad-CAMs (see Section 4.1), but our idea is also different in spirit. In fact, we do not force the network to focus on relevant regions as in [221] but we want the network to use i) different explanations for the same objects (to achieve better generalization) and ii) the explanations to be consistent across domains (to avoid overfitting a single input distribution).
据我们所知,唯一使用解释方法进行域泛化的工作是Zunino等人[221]提出的。他们引入了一种基于显著性的方式,利用二维二值像素位置图作为输入,该图对应真实物体分割。该图中,物体类别标签所在像素位置为1,其余为0。尽管他们证明了该方法更好地聚焦于相关区域,但我们认为对类别标签物体的额外标注是一个主要缺点,而我们在本工作中解决了这一问题。实际上,我们不仅避免了使用标注,直接采用Grad-CAM(见第4.1节),而且我们的思路也有所不同。我们并不强制网络聚焦于相关区域(如[221]所做),而是希望网络能够i) 对相同物体使用不同的解释(以实现更好的泛化),ii) 解释在不同域间保持一致(以避免对单一输入分布的过拟合)。
In order to apply some of the previously mentioned topics from the explainability literature to the domain generalization task, we specifically investigate the usage of gradient class activation maps from Section 3.2.1, as well as prototypes from Section 3.2.3. Our methods which are based upon these approaches are respectively described in Section 4.1 (DIVCAM), Section 4.2 (ProDROP), and Section 4.2.3 (D-TRANSFORMERS).
为了将前述可解释性文献中的部分内容应用于域泛化任务,我们特别研究了第3.2.1节中的梯度类激活图(gradient class activation maps)以及第3.2.3节中的原型。基于这些方法,我们分别在第4.1节(DIVCAM)、第4.2节(ProDROP)和第4.2.3节(D-TRANSFORMERS)中描述了相应的方法。
In Section 2.6 we introduce the concept of Representation Self-Challenging for domain generalization while in Section 3.2.1 class activation maps, and specifically Grad-CAM gets introduced. It is quite easy to see that the importance scores
在第2.6节中,我们介绍了用于域泛化的表示自我挑战(Representation Self-Challenging)概念,而在第3.2.1节中引入了类激活图,特别是Grad-CAM。很容易看出,Grad-CAM中方程(3.2)的权重分数
Despite the effectiveness of the model, we believe the approach of Huang et al. [86] does not fully exploit the relation between a feature vector and the actual content of the image. We argue (and experimentally demonstrate) that we can directly use CAMs to construct the self-challenging task. In particular, while the raw target gradient represents the significance of each channel in each spatial location for the prediction, CAMs allow us to better capture the actual importance of each image region. Thus, performing the targeted masking on highest CAM values means explicitly excluding the most relevant region of the image that where used for the prediction, forcing the model to focus on other (and interpretable) visual cues for recognizing the object of interests.
尽管该模型效果显著,我们认为Huang等人[86]的方法并未充分利用特征向量与图像实际内容之间的关系。我们主张(并通过实验验证)可以直接使用类激活映射(CAMs)来构建自我挑战任务。具体而言,虽然原始目标梯度表示每个通道在每个空间位置对预测的重要性,CAMs则能更好地捕捉图像各区域的实际重要性。因此,在最高CAM值处进行有针对性的遮罩,意味着明确排除用于预测的图像最相关区域,迫使模型关注其他(且可解释的)视觉线索以识别目标对象。
Therefore, as an intuitive baseline, we propose Diversified Class Activation Maps (DIvCAM), combining the two approaches as shown in Algorithm 2 or visualized on a high level in Figure 4.1. For that, during each step of the training procedure, we extract the features, compute the gradients with respect to the features as in Equation (2.8),and perform spatial average pooling to yield
因此,作为一个直观的基线,我们提出了多样化类激活映射(Diversified Class Activation Maps,DIvCAM),结合了两种方法,如算法2所示,或在图4.1中高层次可视化。为此,在训练过程的每一步,我们提取特征,按照公式(2.8)计算相对于特征的梯度,并执行空间平均池化以根据公式(2.10)得到

Figure 4.1: Visualization of the DIVCAM training process
图4.1:DIVCAM训练过程的可视化
Based on these maps and similar to Equation (2.11),we compute a mask
基于这些映射,类似于公式(2.11),我们计算一个掩码
As class activation maps and the corresponding masks are averaged along the channel dimension to be specific for each spatial location, we duplicate the mask along all channels to yield a mask with the same size of the features
由于类激活映射及对应掩码沿通道维度平均以针对每个空间位置,我们将掩码沿所有通道复制,得到与特征
where
其中
Intuitively, constantly applying this masking for all samples within each batch disregards important features and results in relatively poor performance as the network isn't able to learn discriminative features in the first place. Therefore, applying the mask only for certain samples within each batch as mentioned by Huang et al. [86, Secton 3.3] should yield a better performance. For convenience, we call this process mask batching. On top of that, one could schedule the mask batching with an increasing factor (e.g. linear schedule) such that masking gets applied more in the later training epochs where discriminative features have been learned. We apply the mask only if the sample
直观上,对每个批次中的所有样本持续应用此遮罩会忽略重要特征,导致性能较差,因为网络无法首先学习判别特征。因此,如Huang等人[86,第3.3节]所述,仅对每批次中的部分样本应用遮罩应能获得更好性能。为方便起见,我们称此过程为掩码批处理。此外,可以采用递增因子(如线性调度)安排掩码批处理,使得在后期训练阶段判别特征已被学习时,遮罩应用得更多。我们仅当样本
where
其中
Algorithm 2: Diversified Class Activation Maps (DIVCAM)
算法2:多样化类激活映射(DIVCAM)
Input: Data
输入:数据
while epoch
当训练轮数为
for every batch \( \mathbf{x},\mathbf{y} \) do
对于每个批次 \( \mathbf{x},\mathbf{y} \) 执行
Extract features \( \mathbf{z} = \phi \left( \mathbf{x}\right) \) \( //\mathbf{z} \) has shape \( {\mathbb{R}}^{{H}_{\mathbf{z}} \times {W}_{\mathbf{z}} \times K} \)
提取特征 \( \mathbf{z} = \phi \left( \mathbf{x}\right) \) \( //\mathbf{z} \) 的形状为 \( {\mathbb{R}}^{{H}_{\mathbf{z}} \times {W}_{\mathbf{z}} \times K} \)
Compute \( {\mathbf{g}}_{\mathbf{z},c} \) with Equation (2.8)
使用公式 (2.8) 计算 \( {\mathbf{g}}_{\mathbf{z},c} \)
Compute \( {\widetilde{\mathbf{g}}}_{\mathbf{z},c}^{k} \) with Equation (2.10) // \( {\widetilde{\mathbf{g}}}_{\mathbf{z}} \) has shape \( {\mathbb{R}}^{1 \times 1 \times K} \)
使用公式 (2.10) 计算 \( {\widetilde{\mathbf{g}}}_{\mathbf{z},c}^{k} \) // \( {\widetilde{\mathbf{g}}}_{\mathbf{z}} \) 的形状为 \( {\mathbb{R}}^{1 \times 1 \times K} \)
Compute \( {\mathbf{M}}_{c} \) with Equation (4.1) // \( \mathrm{M} \) has shape \( {\mathbb{R}}^{{H}_{\mathbf{z}} \times {W}_{\mathbf{z}} \times 1} \)
使用公式 (4.1) 计算 \( {\mathbf{M}}_{c} \) // \( \mathrm{M} \) 的形状为 \( {\mathbb{R}}^{{H}_{\mathbf{z}} \times {W}_{\mathbf{z}} \times 1} \)
Compute \( {\mathbf{m}}_{c,i,j} \) with Equation (4.2)
使用公式 (4.2) 计算 \( {\mathbf{m}}_{c,i,j} \)
Repeat mask along channels // Afterwards \( \mathrm{m} \) has shape \( {\mathbb{R}}^{{H}_{\mathbf{z}} \times {W}_{\mathbf{z}} \times K} \)
沿通道重复掩码 // 之后 \( \mathrm{m} \) 的形状为 \( {\mathbb{R}}^{{H}_{\mathbf{z}} \times {W}_{\mathbf{z}} \times K} \)
Adapt \( {\mathbf{m}}_{c,i,j} \) with Equation (4.4)
使用公式 (4.4) 调整 \( {\mathbf{m}}_{c,i,j} \)
Compute \( \widetilde{\mathbf{z}} \) with Equation (4.3)
使用公式 (4.3) 计算 \( \widetilde{\mathbf{z}} \)
Backpropagate loss \( {\mathcal{L}}_{ce}\left( {w\left( \widetilde{\mathbf{z}}\right) ,\mathbf{y}}\right) \)
反向传播损失 \( {\mathcal{L}}_{ce}\left( {w\left( \widetilde{\mathbf{z}}\right) ,\mathbf{y}}\right) \)
end
结束
end
结束
To try and improve the effectiveness of our CAM-based regularization approach, we can borrow some practices from the weakly-supervised object localization literature. In particular, we explore the use of Homogeneous Negative CAMs (HNC) [180] and Threshold Average Pooling (TAP) [12]. Both methods improve the performance of ordinary CAMs and focus them better on the relevant aspects of an image. See Section 5.4.3 for an evaluation of these variants.
为了尝试提升基于CAM(类激活映射)正则化方法的有效性,我们可以借鉴弱监督目标定位领域的一些做法。特别地,我们探讨了同质负类激活映射(Homogeneous Negative CAMs,HNC)[180]和阈值平均池化(Threshold Average Pooling,TAP)[12]的使用。这两种方法均提升了普通CAM的性能,并更好地聚焦于图像的相关部分。相关变体的评估见第5.4.3节。
According to Bae, Noh, and Kim [12], one problem of traditional class activation maps is that the activated areas for each feature map differ by the respective channels because these capture different class information which isn't properly reflected in the global average pooling operation. Since every channel is globally averaged, smaller feature activation areas result in smaller globally averaged values despite a similar maximum activation value. This doesn't necessarily mean that one of the features is more relevant for the prediction, but can simply be caused by a large area with small activations. To combat this problem,the weight
根据Bae、Noh和Kim [12]的研究,传统类激活映射存在的问题之一是每个特征图的激活区域因通道不同而异,因为不同通道捕获了不同的类别信息,而这些信息未能在全局平均池化操作中得到恰当反映。由于每个通道都被全局平均,较小的特征激活区域会导致较小的全局平均值,尽管最大激活值相似。这并不一定意味着某个特征对预测更为重要,而可能仅仅是由大面积小激活引起。为解决此问题,通常在比较两个通道时,权重
When incorporating this into DIvCAM, this results in changing the global average pooling after self-challenging has been applied to a threshold average pooling. Generally, this plug-in replacement can be seen as a trade-off between global max pooling which is better at identifying the important activations of each channel and global average pooling which has the advantage that it expands the activation to broader regions, allowing the loss to backpropagate.
将此方法整合进DIvCAM后,意味着在自我挑战(self-challenging)应用后,将全局平均池化替换为阈值平均池化。总体来看,这种替换可视为全局最大池化和全局平均池化之间的权衡,前者更擅长识别每个通道的重要激活,后者则有利于激活扩展到更广区域,从而使损失能够反向传播。
Based on the analysis of Sun et al. [180], negative class activation maps, i.e. the class activation maps for classes other than the ground truth, often have false activations even when they are not present in an image. To solve this localization error, they propose a loss function which adds a weighted homogeneous negative
基于Sun等人[180]的分析,负类激活图,即除真实类别外其他类别的激活图,常常在图像中不存在该类别时仍出现错误激活。为解决这一定位误差,他们提出了一种损失函数,在现有的交叉熵损失基础上增加了加权的均匀负类(HNC)损失项。如公式(4.6)所示,其中

Figure 4.2: Used class activation maps for DivCAM-S in update step 300/5000, 2700/5000, and
图4.2:在更新步骤300/5000、2700/5000及
Sun et al. [180] propose two approaches for implementing
Sun等人[180]在其工作中提出了两种实现
Here,
这里,
Generally,with this approach,we add two hyperparametes in the form of the weighting parameter
通常,采用此方法时,我们引入两个超参数:权重参数
Since we use Grad-CAMs instead of ordinary CAMs in DIvCAM, naïvely applying this would require computing the gradient for every negative class
由于我们在DivCAM中使用Grad-CAM而非普通CAM,直接应用此方法需要对集合
Equation (4.10) where
方程(4.10)中,
To speed up the training for tasks with a large number of classes, we approximate the loss by summing the negative class confidences before backpropagating as shown in Equation (4.11). This amounts to considering all negative classes within
为加速具有大量类别任务的训练,我们通过在反向传播前对负类置信度求和来近似损失,如方程(4.11)所示。这相当于将集合
To finally implement this into DrvCAM,we simply substitute the current loss
最终,为将此方法实现于DrvCAM,我们仅需在算法2第12行将当前损失
Next, we can try to utilize domain information in DIVCAM by aligning distributions of class activation maps produced by the same class across domains. We want their distributions to align as close as possible such that we cannot identify which domain produced which class activation map. For that, we can utilize some methods previously introduced in Section 2.3.1, in particular we explore minimizing the sample maximum mean discrepancy introduced in Equation (2.7) and using a conditional domain adversarial neural network (CDANN). See Section 5.4.3 for an evaluation of these variants.
接下来,我们可以尝试在DIVCAM中利用领域信息,通过对跨领域同一类别产生的类激活图分布进行对齐。我们希望它们的分布尽可能一致,以致无法识别哪个领域生成了哪张类激活图。为此,我们可以利用第2.3.1节中介绍的一些方法,特别是探索最小化方程(2.7)中提出的样本最大均值差异(MMD)以及使用条件领域对抗神经网络(CDANN)。相关变体的评估见第5.4.3节。
We combine the domain adversarial neural network (CDANN) approach, originally introduced by Li et al. [116], with DivCAM to align the distributions of CAMs across domains. For that, we try to predict the domain to which a class activation map belongs by passing it to a multi-layer perceptron
我们结合了Li等人[116]最初提出的领域对抗神经网络(CDANN)方法与DivCAM,以对齐不同领域间的类激活图(CAMs)分布。为此,我们尝试通过将类激活图输入多层感知机
During each training step, we either update the discriminator, i.e. the predictor for the domain, or the generator, i.e. the main network including featurizer and classifier. The discriminator loss inherently includes a
在每次训练步骤中,我们要么更新判别器,即领域预测器,要么更新生成器,即包含特征提取器和分类器的主网络。判别器损失本质上包含一个对梯度的
Given two samples
给定两个样本

Figure 4.3: Domain-agnostic Prototype Network
图4.3:领域无关原型网络
In simpler terms,we map features into a reproducing kernel Hilbert space
简单来说,我们将特征映射到再生核希尔伯特空间(RKHS)
Since the choice of kernel function can have a significant impact on the distance metric, we adopt the approach of Li et al. [112] and use a mixture kernel by averaging over multiple choices of
由于核函数的选择对距离度量有显著影响,我们采用Li等人[112]的方法,通过对多种
With this approach, we inherently align the computed masks by aligning the individual samples from different domains, aiming at producing domain invariant masks. This procedure can be applied at different levels e.g. on the features, class activation maps, or masked class activation maps. In Section 5.4.3, we only provide results for the feature level due to the effectiveness that similar approaches showed in DOMAINBED. However, we observe a similar trend for the other application levels as well.
通过该方法,我们通过对不同领域的单个样本进行对齐,内在地实现了计算掩码的对齐,旨在生成领域不变的掩码。该过程可应用于不同层级,例如特征层、类激活图层或掩码类激活图层。在第5.4.3节中,我们仅提供特征层的结果,因类似方法在DOMAINBED中表现出较高效能。然而,我们也观察到其他应用层级呈现类似趋势。
Another approach to combine explainability methods with the task of domain generalization is to use the prototype method outlined in Section 3.2.3. In particular, we can directly adapt the approach of Chen et al. [32] as a baseline where we associate each class with a pre-defined number of prototypes. The cluster and separation losses from Equation (3.6) ensure that each prototype resembles a prototypical attribute for the associated class and we minimize them according to Equation (3.5).
将可解释性方法与领域泛化任务结合的另一种方法是使用第3.2.3节中概述的原型方法。具体而言,我们可以直接采用Chen等人[32]的方法作为基线,将每个类别关联一定数量的预定义原型。公式(3.6)中的聚类和分离损失确保每个原型都代表该类别的典型属性,并根据公式(3.5)对其进行最小化。
For our application scenario,this prototype layer is used after the domain-agnostic featurizer
对于我们的应用场景,该原型层在领域无关特征提取器

Figure 4.4: Ensemble Prototype Network
图4.4:集成原型网络
Following the intuition provided by works that utilize model ensembling, which have been described in Section 2.3.2, we can use domain information by up-scaling the network to use a prototype layer for each domain separately. For the PACS dataset, this would correspond to having three prototype layers, one for each training domain e.g. a photo, art, and cartoon prototype layer when predicting sketch images. Each prototype layer is only trained with images from their corresponding domain.
基于第2.3.2节中描述的利用模型集成的直觉,我们可以通过扩展网络为每个领域单独使用一个原型层来利用领域信息。对于PACS数据集,这相当于拥有三个原型层,每个训练领域一个,例如在预测素描图像时分别有照片、艺术和卡通原型层。每个原型层仅用其对应领域的图像进行训练。
As shown in Figure 4.4, we associate each domain with both a prototype layer and a classifier which takes similarity scores of that domain's prototypes as input. During training, we only feed images of the associated domain to the respective prototype layer and classifier to enforce this domain correspondence. The aggregation weights of the final linear layer are set to a one-hot encoding representing the correct domain. During testing, we can then feed the new unseen domain to each domain prototype layer, allowing each domain's prototypes to influence the final prediction.
如图4.4所示,我们为每个领域关联一个原型层和一个分类器,分类器以该领域原型的相似度分数作为输入。训练时,我们仅将对应领域的图像输入相应的原型层和分类器,以强制实现领域对应关系。最终线性层的聚合权重设置为表示正确领域的独热编码。测试时,我们可以将新的未见领域输入每个领域的原型层,使每个领域的原型都能影响最终预测。
There exist multiple strategies for setting the aggregation weights during this stage. The most simple version is to set the influence of each domain uniform, i.e. if we have three training domains the connections from each domain would have the weight
在此阶段设置聚合权重存在多种策略。最简单的版本是均匀设置每个领域的影响力,即如果有三个训练领域,则每个领域的连接权重为
Lastly, we also experiment with an ensemble variant that is specific to prototype layers. Instead of only pre-defining a number of prototypes for each class in the domain agnostic prototype layer outlined in Section 4.2, we can also pre-define domain correspondence for each prototype. Training is then done by passing each domain separately to the prototype layer and masking the overall prototype outputs if they do not correspond to the current environment. The cluster and separation losses are adapted to an average over the individual environments as:
最后,我们还尝试了一个特定于原型层的集成变体。与仅在第4.2节中定义的领域无关原型层为每个类别预定义原型数量不同,我们还可以为每个原型预定义领域对应关系。训练时,将每个领域单独传入原型层,并在不对应当前环境时屏蔽整体原型输出。聚类和分离损失调整为对各个环境的平均,如下所示:
where
其中
we optimize the final loss of our ensemble model as:
我们优化集成模型的最终损失为:
While all of these ensemble variants should work given the intuition from previous works that have been using model ensembles for learned domain-specific latent spaces, we observe that these assumptions do not hold for prototype networks based on our additional experiments of the proposed variants. For us, any prototype ensemble was consistently outperformed by one domain-agnostic prototype layer.
尽管基于先前使用模型集成学习领域特定潜在空间的工作直觉,所有这些集成变体理论上都应有效,但根据我们对所提变体的额外实验观察,这些假设并不适用于原型网络。对我们而言,任何原型集成都始终被单一领域无关原型层所超越。
Initial experiments with both domain-agnostic and domain-specific prototype layers lead to unsatisfactory results. To investigate this behavior,we analyze the pairwise prototype
对领域无关和领域特定原型层的初步实验结果均不理想。为探究此现象,我们分析了成对原型的
Through the
通过
The results of this analysis can be seen in Figure 4.5 and Figure 4.6 for the first data split and a negative weight of
该分析结果可见于图4.5和图4.6,针对第一数据划分和负权重

Figure 4.5: Pairwise learned prototype
图4.5:最佳表现模型在每个测试域中,负权重
From a design standpoint, we would like the prototypes within each class to be reasonably well spaced out in the latent space such that they can resemble different discriminative attributes about each class. That is, we would like the network to utilize all the prototypes and not only rely on a small subset of prototypes or discriminative features for their prediction. The distances of these prototypes to the prototypes of the other classes, however, should not be restrained in any way and should be learned automatically. For example, when predicting different bird species, this allows the network to place similar head prototypes of different classes closer together. The existing cluster and separation losses enforce that each prototype associated with that class is close to at least one latent patch of that class while maximizing the distance to the prototypes of other classes. However, this does not enforce that each prototype associated with that class acts on a different discriminative feature.
从设计角度来看,我们希望每个类别内的原型在潜在空间中合理分布,以便它们能代表该类别的不同判别属性。也就是说,我们希望网络利用所有原型,而非仅依赖少数原型或判别特征进行预测。然而,这些原型与其他类别原型之间的距离不应受到任何限制,应自动学习。例如,在预测不同鸟类时,这允许网络将不同类别中相似的头部原型放得更近。现有的聚类和分离损失确保与该类别相关的每个原型至少靠近该类别的一个潜在补丁,同时最大化与其他类别原型的距离。但这并不强制每个与该类别相关的原型作用于不同的判别特征。
One approach to possibly enforce this behavior is to incorporate the self-challenging method previously applied to DIVCAM to the presented prototype network resulting in a novel algorithm we call prototype dropping (PRODROP) which is described in Algorithm 3. In essence, we extract features by passing our input images to the featurizer
一种可能强制实现此行为的方法是将先前应用于DIVCAM的自我挑战方法整合到所提出的原型网络中,形成一种我们称之为原型丢弃(PRODROP)的新算法,详见算法3。本质上,我们通过将输入图像传入特征提取器
Algorithm 3: Prototype Dropping (ProDrop)
算法3:原型丢弃(ProDrop)
Input: Data
输入:数据
while epoch
当 epoch
for every batch \( \mathbf{x},\mathbf{y} \) do
对每个批次 \( \mathbf{x},\mathbf{y} \) 执行
Extract features \( \mathbf{z} = \phi \left( \mathbf{x}\right) \) // \( \mathbf{z} \) has shape \( {\mathbb{R}}^{{H}_{\mathbf{z}} \times {W}_{\mathbf{z}} \times K} \)
提取特征 \( \mathbf{z} = \phi \left( \mathbf{x}\right) \) // \( \mathbf{z} \) 的形状为 \( {\mathbb{R}}^{{H}_{\mathbf{z}} \times {W}_{\mathbf{z}} \times K} \)
Compute \( {g}_{{\mathbf{p}}_{j}}\left( \mathbf{z}\right) \) with Equation (3.4)
使用公式(3.4)计算 \( {g}_{{\mathbf{p}}_{j}}\left( \mathbf{z}\right) \)
Compute \( {\mathbf{m}}_{c,j} \) with Equation (4.22)
使用公式(4.22)计算 \( {\mathbf{m}}_{c,j} \)
Adapt \( {\mathbf{m}}_{c,j} \) with Equation (4.4)
使用公式(4.4)调整 \( {\mathbf{m}}_{c,j} \)
Compute \( {\widetilde{g}}_{\mathbf{p}}\left( \mathbf{z}\right) \) with Equation (4.23)
使用公式(4.23)计算 \( {\widetilde{g}}_{\mathbf{p}}\left( \mathbf{z}\right) \)
Backpropagate loss \( {\mathcal{L}}_{ce}\left( {w\left( {{\widetilde{g}}_{\mathbf{p}}\left( \mathbf{z}\right) }\right) ,\mathbf{y}}\right) + {\lambda }_{4}{\mathcal{L}}_{\text{clst }} + {\lambda }_{5}{\mathcal{L}}_{\text{sep }} \)
反向传播损失 \( {\mathcal{L}}_{ce}\left( {w\left( {{\widetilde{g}}_{\mathbf{p}}\left( \mathbf{z}\right) }\right) ,\mathbf{y}}\right) + {\lambda }_{4}{\mathcal{L}}_{\text{clst }} + {\lambda }_{5}{\mathcal{L}}_{\text{sep }} \)
end
结束
end
结束
where
其中
In practice, all of these operations can be efficiently implemented using torch.quantile, torch.lt, and torch.logical_or on small tensors, all of which pose no significant computational overhead.
在实际操作中,所有这些操作都可以通过 torch.quantile、torch.lt 和 torch.logical_or 在小张量上高效实现,且不会带来显著的计算开销。
The effect of this approach on the pairwise prototype distances can be seen in Figure B.3 as well as Appendix B. We observe, that even though self-challenging helps to boost the overall performance (see Section 5.4.4), it does not particularly well achieve the previously described desired distance properties and only improves them marginally. Positive effects can be seen in Figure 4.6 for the sketch and cartoon domain, Figure B. 10 for the sketch domain, or Figure B. 9 for the cartoon domain.
该方法对成对原型距离的影响可见于图B.3及附录B。我们观察到,尽管自我挑战(self-challenging)有助于提升整体性能(参见第5.4.4节),但其并未很好地实现前述期望的距离特性,仅有边际改进。正面效果可见于图4.6(草图和卡通领域)、图B.10(草图领域)以及图B.9(卡通领域)。
Our second approach to enforce the desired distance structures is to add an additional intra-class prototype loss term
我们加强期望距离结构的第二种方法是添加额外的类内原型损失项
where the
其中
Influence of the negative weight on the distance metrics We also analyze the discrepancies between the distance metrics when comparing
负权重对距离度量的影响 我们还分析了比较

Figure 4.6: Pairwise learned prototype
图4.6:最佳模型在每个测试域中负权重为
Instead of directly learning the set of prototypes
我们不直接学习原型集合
Contrary to the previous approach, by averaging all the average-pooled latent representations, there exists only one prototype for each class i.e.
与之前的方法相反,通过对所有平均池化潜在表示求平均,每个类别仅存在一个原型,即
In particular, similar to Transformers [189] and the original idea by Doersch, Gupta, and Zisserman [43], we use three linear transformations to compute keys, values, and queries. In practice, the key head
特别地,类似于Transformers [189]和Doersch、Gupta及Zisserman [43]的原始思路,我们使用三个线性变换来计算键(keys)、值(values)和查询(queries)。在实际操作中,键头
Afterwards,we re-scale this dot similarity by using
随后,我们使用
Finally,we can use the support-set values
最后,我们可以利用支持集的值
As a distance, we compute the dot similarity between the prototype and the query image values
作为距离度量,我们计算原型与查询图像值
the query image:
查询图像:
Most notably,we deploy the dot similarity as a distance metric here instead of the squared
最值得注意的是,我们这里采用点积相似度作为距离度量,而非Doersch、Gupta和Zisserman [43]使用的平方
In an effort to improve comparability and reproducability, we use DOMAINBED [73] for all ablation studies and experimental results. We compare to all methods currently in DOMAINBED which includes our provided implementation of RSC. Further, all experiments show results for both training-domain validation which assumes that training and testing domains have similar distributions and oracle validation which has limited access to the testing domain. We omit leave-one-domain-out cross-validation as it requires the most computational resources and performs the worst out of the three validation techniques currently available in DomainBED [73].
为了提升可比性和可复现性,我们在所有消融研究和实验结果中均使用DOMAINBED [73]。我们比较了DOMAINBED中所有现有方法,包括我们提供的RSC实现。此外,所有实验均展示了训练域验证结果(假设训练和测试域分布相似)和oracle验证结果(有限访问测试域)。我们省略了留一域交叉验证,因为其计算资源需求最高且在DomainBED [73]中三种验证技术中表现最差。
Since the size of the validation dataset can have a heavy impact on performance, we follow the design choices of DOMAINBED and choose
由于验证集的大小会对性能产生重大影响,我们遵循DOMAINBED的设计选择,选择每个域的
For the main results, we use the official DOMAINBED hyperparameter distributions as well as the ADAM optimizer with no learning rate schedule. This corresponds to the setup in which all baselines from Table 5.2 have been evaluated in by Gulrajani and Lopez-Paz [73] to provide a fair comparison. For the hyperparameters that get introduced in our methods, we choose similar distributions as used in the ablation studies from Section 5.4. Further, Section 5.4 also shows the official distributions of DOMAINBED for all other shared hyperparameters such as learning rate
对于主要结果,我们使用官方DOMAINBED的超参数分布以及无学习率调度的ADAM优化器。这对应于Gulrajani和Lopez-Paz [73]在表5.2中评估所有基线时所采用的设置,以确保公平比较。对于我们方法中新引入的超参数,我们选择与第5.4节消融研究中使用的类似分布。此外,第5.4节还展示了DOMAINBED官方对所有其他共享超参数(如学习率
The high-level results for DIvCAM-S inside the DomainBED framework and across datasets are shown in Table 5.2. For completeness, we also show results outside of DOMAINBED on the official PACS split in Table 5.3 using a ResNet-18 backbone. The full results, including the performance for choosing any domain inside each dataset as a testing domain, are shown in Appendix A. While we are able to achieve state-of-the-art performance outside of the DOMAINBED framework, utilizing learning rate schedules and hyperparameter fine-tuning, we are also able to achieve Top-4 performance across five datasets within DOMAINBED. Notably, we outperform RSC in almost every scenario, while exhibiting less standard deviation. This leads to more stable results and a more suitable method which can easily be used as a plug-and-play approach. In comparison with all other methods, we achieve good performance for the two most challenging datasets, namely Terra Incognita (Top-2) and DomainNet (Top-4), outperforming RSC for both datasets by up to 2%. This suggests, that directly reconstructing to Grad-CAMs in DIVCAM-S specifically provides value for the more challenging datasets in DOMAINBED. Keep in mind, that this is possible without adding any additional parameters and while providing a framework where intermediate class activation maps can be visualized that guide the networks training process. While this might not be the best explainability method, it certainly can offer more insights than treating the optimization procedure as a block-box without this guidance.
DIvCAM-S在DomainBED框架内及跨数据集的高层次结果如表5.2所示。为完整起见,我们还在表5.3中展示了使用ResNet-18骨干网络,在官方PACS划分外的结果。完整结果,包括选择每个数据集内任一域作为测试域的性能,见附录A。虽然我们能够在DOMAINBED框架外通过利用学习率调度和超参数微调实现最先进性能,但在DOMAINBED内,我们也能在五个数据集中实现Top-4性能。值得注意的是,我们几乎在所有场景中均优于RSC,且表现出更小的标准差。这带来了更稳定的结果和更适合即插即用的方法。与其他方法相比,我们在两个最具挑战性的数据集Terra Incognita(Top-2)和DomainNet(Top-4)上表现良好,分别超越RSC最多达2%。这表明,DIVCAM-S中直接重建至Grad-CAM(梯度加权类激活映射)对DOMAINBED中更具挑战性的数据集特别有价值。请注意,这在不增加任何额外参数的情况下实现,同时提供了一个可以可视化中间类激活图的框架,指导网络的训练过程。虽然这可能不是最佳的可解释性方法,但肯定比将优化过程视为黑箱且无此指导更具洞察力。
| Algorithm | P | A | C | S | Avg. |
| DivCAM-S | \( {94.4} \pm {0.7} \) | \( {80.5} \pm {0.4} \) | \( {74.6} \pm {2.2} \) | \( {79.0} \pm {0.9} \) | \( {82.1} \pm {0.3} \) |
| ProDrop | \( {93.6} \pm {0.6} \) | \( {82.1} \pm {0.9} \) | \( {76.4} \pm {0.9} \) | \( {76.3} \pm {0.6} \) | \( {82.1} \pm {0.6} \) |
| D-TRANSFORMERS | \( {93.7} \pm {0.6} \) | \( {80.1} \pm {1.2} \) | \( {73.3} \pm {1.3} \) | \( {72.6} \pm {1.5} \) | \( {79.9} \pm {0.1} \) |
| 算法 | P | A | C | S | 平均值 |
| DivCAM-S | \( {94.4} \pm {0.7} \) | \( {80.5} \pm {0.4} \) | \( {74.6} \pm {2.2} \) | \( {79.0} \pm {0.9} \) | \( {82.1} \pm {0.3} \) |
| ProDrop | \( {93.6} \pm {0.6} \) | \( {82.1} \pm {0.9} \) | \( {76.4} \pm {0.9} \) | \( {76.3} \pm {0.6} \) | \( {82.1} \pm {0.6} \) |
| D-TRANSFORMERS | \( {93.7} \pm {0.6} \) | \( {80.1} \pm {1.2} \) | \( {73.3} \pm {1.3} \) | \( {72.6} \pm {1.5} \) | \( {79.9} \pm {0.1} \) |
Table 5.1: Performance comparison of the proposed methods for the PACS dataset with a ResNet-18 backbone.
表5.1:基于ResNet-18骨干网络的PACS数据集上所提方法的性能比较。
The fact that we are able to achieve state-of-the-art results for DIVCAM-S outside of DOMAINBED shows a common problem with works in domain generalization, namely consistency in algorithm comparisons and reproducability. Due to computational constraints, novel methods are often only compared to the results provided in previous works. As a result, details such as the hyperparameter tuning procedure, learning rate schedules, or even the optimizer are often omitted or chosen to fit the algorithm at hand. From some of our experiments, we observe that simply fine-tuning a learning rate schedule for any of the methods from Table 5.2 offers a bigger performance increase than choosing a better algorithm in the first place. As such, design choices can have a heavy impact on how well the algorithm performs. Having a common benchmarking procedure such as DOMAINBED, where these are fixed, is necessary to make substantial progress in this field. We hope that we can push adoption in the community with our addition of RSC and the methods proposed in this work. However, not using learning rate schedules and following the pre-defined distributions for learning rates, batch sizes, or weight decays might inherently bias this comparison and shouldn't be neglected as a factor.
我们能够在DOMAINBED之外为DIVCAM-S实现最先进的结果,这反映了领域泛化研究中一个普遍存在的问题,即算法比较的一致性和可复现性。由于计算资源限制,新的方法通常仅与先前工作的结果进行比较。因此,诸如超参数调优过程、学习率调度,甚至优化器的细节常被省略或被选择以适应当前算法。从我们的一些实验中观察到,仅仅为表5.2中的任一方法微调学习率调度,带来的性能提升往往超过了选择更优算法本身。因此,设计选择对算法性能有重大影响。像DOMAINBED这样固定这些因素的统一基准测试程序,对于推动该领域的实质性进展是必要的。我们希望通过引入RSC及本文提出的方法,推动社区的采纳。然而,不使用学习率调度且遵循预定义的学习率、批量大小或权重衰减分布,可能固有地导致比较偏差,这一点不应被忽视。
On top of that, Table 5.2 also shows the results for D-TRANSFORMERS. Generally, we observe that many variants of the prototype based approaches outlined in Section 3.2.3 fail to generalize well to ResNet-50, even though they exhibit promising performance on ResNet-18 (see Table 5.1 for PRo-DROP results on PACS and ResNet-18). This might be due to the fact that prototypical approaches commonly require higher resolution feature maps e.g.
此外,表5.2还展示了D-TRANSFORMERS的结果。总体来看,我们观察到第3.2.3节中概述的多种基于原型的方法在ResNet-50上往往难以良好泛化,尽管它们在ResNet-18上表现出较好的性能(参见表5.1中PACS数据集上PRo-DROP的结果)。这可能是因为原型方法通常需要更高分辨率的特征图,例如文献[43]中通过空洞卷积实现的特征图。尽管如此,D-TRANSFORMERS即使未做这些额外调整,也已在DOMAINBED基准测试中表现良好。考虑到这些调整可以进一步提升性能,使该方法更适合领域泛化,尽管尚不清楚其他算法从中受益多少。原型方法的价值不仅在于良好的性能,还在于其显著的可解释性,因为它们允许可视化原型相似度图、与原型最接近的图像块,甚至在联合训练了足够解码器的情况下直接可视化原型。特别地,D-TRANSFORMERS能够在DOMAINBED的VLCS数据集上取得第二名的表现,但在TerraIncognita数据集上的表现似乎不佳。我们认为这是因为该数据集通常只包含距离较远的不同动物的部分,使得网络难以利用有限的特征分辨率提取有意义的原型。因此,它应当是最难的数据集之一。
| Algorithm | \( \mathbf{{Ref}.} \) | VLCS | PACS | Office-Home | Terra Inc. | DomainNet | \( \mathbf{{Avg}.} \) |
| ERM | [187] | \( {77.5} \pm {0.4} \) | \( {85.5} \pm {0.2} \) | \( {66.5} \pm {0.3} \) | \( {46.1} \pm {1.8} \) | \( {40.9} \pm {0.1} \) | 63.3 |
| IRM | [9] | \( {78.5} \pm {0.5} \) | \( {83.5} \pm {0.8} \) | \( {64.3} \pm {2.2} \) | \( {47.6} \pm {0.8} \) | \( {33.9} \pm {2.8} \) | 61.5 |
| GroupDRO | [159] | \( {76.7} \pm {0.6} \) | \( {84.4} \pm {0.8} \) | \( {66.0} \pm {0.7} \) | \( {43.2} \pm {1.1} \) | \( {33.3} \pm {0.2} \) | 60.7 |
| Mixup | [203] | \( {77.4} \pm {0.6} \) | \( {84.6} \pm {0.6} \) | \( {68.1} \pm {0.3} \) | \( {47.9} \pm {0.8} \) | \( {39.2} \pm {0.1} \) | 63.4 |
| MLDG | [108] | \( {77.2} \pm {0.4} \) | \( {84.9} \pm {1.0} \) | \( {66.8} \pm {0.6} \) | \( {47.7} \pm {0.9} \) | \( {41.2} \pm {0.1} \) | 63.5 |
| CORAL | [179] | \( {78.8} \pm {0.6} \) | \( {86.2} \pm {0.3} \) | \( {68.7} \pm {0.3} \) | \( {47.6} \pm {1.0} \) | \( {41.5} \pm {0.1} \) | 64.5 |
| MMD | [112] | \( {77.5} \pm {0.9} \) | \( {84.6} \pm {0.5} \) | \( {66.3} \pm {0.1} \) | \( {42.2} \pm {1.6} \) | \( {23.4} \pm {9.5} \) | 58.8 |
| DANN | [61] | \( {78.6} \pm {0.4} \) | \( {83.6} \pm {0.4} \) | \( {65.9} \pm {0.6} \) | \( {46.7} \pm {0.5} \) | \( {38.3} \pm {0.1} \) | 62.6 |
| CDANN | [116] | \( {77.5} \pm {0.1} \) | \( {82.6} \pm {0.9} \) | \( {65.8} \pm {1.3} \) | \( {45.8} \pm {1.6} \) | \( {38.3} \pm {0.3} \) | 62.0 |
| MTL | [19] | \( {77.2} \pm {0.4} \) | \( {84.6} \pm {0.5} \) | \( {66.4} \pm {0.5} \) | \( {45.6} \pm {1.2} \) | \( {40.6} \pm {0.1} \) | 62.8 |
| SagNet | [135] | \( {77.8} \pm {0.5} \) | \( {86.3} \pm {0.2} \) | \( {68.1} \pm {0.1} \) | \( {48.6} \pm {1.0} \) | \( {40.3} \pm {0.1} \) | 64.2 |
| ARM | [212] | \( {77.6} \pm {0.3} \) | \( {85.1} \pm {0.4} \) | \( {64.8} \pm {0.3} \) | \( {45.5} \pm {0.3} \) | \( {35.5} \pm {0.2} \) | 61.7 |
| VREx | [101] | \( {78.3} \pm {0.2} \) | \( {84.9} \pm {0.6} \) | \( {66.4} \pm {0.6} \) | \( {46.4} \pm {0.6} \) | \( {33.6} \pm {2.9} \) | 61.9 |
| RSC | [86] | \( {77.1} \pm {0.5} \) | \( {85.2} \pm {0.9} \) | \( {65.5} \pm {0.9} \) | \( {46.6} \pm {1.0} \) | \( {38.9} \pm {0.5} \) | 62.7 |
| DivCAM-S | (ours) | \( {77.8} \pm {0.3} \) | \( {85.4} \pm {0.2} \) | \( {65.2} \pm {0.3} \) | \( {48.0} \pm {1.2} \) | \( {40.7} \pm {0.0} \) | 63.4 |
| D-TRANSFORMERS | (ours) | \( {78.7} \pm {0.5} \) | \( {84.2} \pm {0.1} \) | - | \( {42.9} \pm {1.1} \) | - | - |
| ERM* | [187] | \( {77.6} \pm {0.3} \) | \( {86.7} \pm {0.3} \) | \( {66.4} \pm {0.5} \) | \( {53.0} \pm {0.3} \) | \( {41.3} \pm {0.1} \) | 65.0 |
| IRM* | [9] | \( {76.9} \pm {0.6} \) | \( {84.5} \pm {1.1} \) | \( {63.0} \pm {2.7} \) | \( {50.5} \pm {0.7} \) | \( {28.0} \pm {5.1} \) | 60.5 |
| GroupDRO* | [159] | \( {77.4} \pm {0.5} \) | \( {87.1} \pm {0.1} \) | \( {66.2} \pm {0.6} \) | \( {52.4} \pm {0.1} \) | \( {33.4} \pm {0.3} \) | 63.3 |
| Mixup* | [203] | \( {78.1} \pm {0.3} \) | \( {86.8} \pm {0.3} \) | \( {68.0} \pm {0.2} \) | \( {54.4} \pm {0.3} \) | \( {39.6} \pm {0.1} \) | 65.3 |
| MLDG* | [108] | \( {77.5} \pm {0.1} \) | \( {86.8} \pm {0.4} \) | \( {66.6} \pm {0.3} \) | \( {52.0} \pm {0.1} \) | \( {41.6} \pm {0.1} \) | 64.9 |
| CORAL* | [179] | \( {77.7} \pm {0.2} \) | \( {87.1} \pm {0.5} \) | \( {68.4} \pm {0.2} \) | \( {52.8} \pm {0.2} \) | \( {41.8} \pm {0.1} \) | 65.5 |
| MMD* | [112] | \( {77.9} \pm {0.1} \) | \( {87.2} \pm {0.1} \) | \( {66.2} \pm {0.3} \) | \( {52.0} \pm {0.4} \) | \( {23.5} \pm {9.4} \) | 61.3 |
| DANN* | [61] | \( {79.7} \pm {0.5} \) | \( {85.2} \pm {0.2} \) | \( {65.3} \pm {0.8} \) | \( {50.6} \pm {0.4} \) | \( {38.3} \pm {0.1} \) | 63.8 |
| CDANN* | [116] | \( {79.9} \pm {0.2} \) | \( {85.8} \pm {0.8} \) | \( {65.3} \pm {0.5} \) | \( {50.8} \pm {0.6} \) | \( {38.5} \pm {0.2} \) | 64.0 |
| MTL* | [19] | \( {77.7} \pm {0.5} \) | \( {86.7} \pm {0.2} \) | \( {66.5} \pm {0.4} \) | \( {52.2} \pm {0.4} \) | \( {40.8} \pm {0.1} \) | 64.7 |
| SagNet* | [135] | \( {77.6} \pm {0.1} \) | \( {86.4} \pm {0.4} \) | \( {67.5} \pm {0.2} \) | \( {52.5} \pm {0.4} \) | \( {40.8} \pm {0.2} \) | 64.9 |
| ARM* | [212] | \( {77.8} \pm {0.3} \) | \( {85.8} \pm {0.2} \) | \( {64.8} \pm {0.4} \) | \( {51.2} \pm {0.5} \) | \( {36.0} \pm {0.2} \) | 63.1 |
| VREx* | [101] | \( {78.1} \pm {0.2} \) | \( {87.2} \pm {0.6} \) | \( {65.7} \pm {0.3} \) | \( {51.4} \pm {0.5} \) | \( {30.1} \pm {3.7} \) | 62.5 |
| RSC* | [86] | \( {77.8} \pm {0.6} \) | \( {86.2} \pm {0.5} \) | \( {66.5} \pm {0.6} \) | \( {52.1} \pm {0.2} \) | \( {38.9} \pm {0.6} \) | 64.3 |
| DivCAM-S* | (ours) | \( {78.1} \pm {0.6} \) | \( {87.2} \pm {0.1} \) | \( {65.2} \pm {0.5} \) | \( {51.3} \pm {0.5} \) | \( {41.0} \pm {0.0} \) | 64.6 |
| D-TRANSFORMERS* | (ours) | \( {77.7} \pm {0.1} \) | \( {86.9} \pm {0.3} \) | - | \( {52.4} \pm {0.8} \) | - | - |
| 算法 | \( \mathbf{{Ref}.} \) | VLCS | PACS | Office-Home | Terra Inc. | DomainNet | \( \mathbf{{Avg}.} \) |
| ERM(经验风险最小化) | [187] | \( {77.5} \pm {0.4} \) | \( {85.5} \pm {0.2} \) | \( {66.5} \pm {0.3} \) | \( {46.1} \pm {1.8} \) | \( {40.9} \pm {0.1} \) | 63.3 |
| IRM(不变风险最小化) | [9] | \( {78.5} \pm {0.5} \) | \( {83.5} \pm {0.8} \) | \( {64.3} \pm {2.2} \) | \( {47.6} \pm {0.8} \) | \( {33.9} \pm {2.8} \) | 61.5 |
| GroupDRO(群组分布鲁棒优化) | [159] | \( {76.7} \pm {0.6} \) | \( {84.4} \pm {0.8} \) | \( {66.0} \pm {0.7} \) | \( {43.2} \pm {1.1} \) | \( {33.3} \pm {0.2} \) | 60.7 |
| Mixup(混合增强) | [203] | \( {77.4} \pm {0.6} \) | \( {84.6} \pm {0.6} \) | \( {68.1} \pm {0.3} \) | \( {47.9} \pm {0.8} \) | \( {39.2} \pm {0.1} \) | 63.4 |
| MLDG(元学习领域泛化) | [108] | \( {77.2} \pm {0.4} \) | \( {84.9} \pm {1.0} \) | \( {66.8} \pm {0.6} \) | \( {47.7} \pm {0.9} \) | \( {41.2} \pm {0.1} \) | 63.5 |
| CORAL(相关对齐) | [179] | \( {78.8} \pm {0.6} \) | \( {86.2} \pm {0.3} \) | \( {68.7} \pm {0.3} \) | \( {47.6} \pm {1.0} \) | \( {41.5} \pm {0.1} \) | 64.5 |
| MMD(最大均值差异) | [112] | \( {77.5} \pm {0.9} \) | \( {84.6} \pm {0.5} \) | \( {66.3} \pm {0.1} \) | \( {42.2} \pm {1.6} \) | \( {23.4} \pm {9.5} \) | 58.8 |
| DANN(域对抗神经网络) | [61] | \( {78.6} \pm {0.4} \) | \( {83.6} \pm {0.4} \) | \( {65.9} \pm {0.6} \) | \( {46.7} \pm {0.5} \) | \( {38.3} \pm {0.1} \) | 62.6 |
| CDANN(条件域对抗神经网络) | [116] | \( {77.5} \pm {0.1} \) | \( {82.6} \pm {0.9} \) | \( {65.8} \pm {1.3} \) | \( {45.8} \pm {1.6} \) | \( {38.3} \pm {0.3} \) | 62.0 |
| MTL(多任务学习) | [19] | \( {77.2} \pm {0.4} \) | \( {84.6} \pm {0.5} \) | \( {66.4} \pm {0.5} \) | \( {45.6} \pm {1.2} \) | \( {40.6} \pm {0.1} \) | 62.8 |
| SagNet | [135] | \( {77.8} \pm {0.5} \) | \( {86.3} \pm {0.2} \) | \( {68.1} \pm {0.1} \) | \( {48.6} \pm {1.0} \) | \( {40.3} \pm {0.1} \) | 64.2 |
| ARM | [212] | \( {77.6} \pm {0.3} \) | \( {85.1} \pm {0.4} \) | \( {64.8} \pm {0.3} \) | \( {45.5} \pm {0.3} \) | \( {35.5} \pm {0.2} \) | 61.7 |
| VREx | [101] | \( {78.3} \pm {0.2} \) | \( {84.9} \pm {0.6} \) | \( {66.4} \pm {0.6} \) | \( {46.4} \pm {0.6} \) | \( {33.6} \pm {2.9} \) | 61.9 |
| RSC | [86] | \( {77.1} \pm {0.5} \) | \( {85.2} \pm {0.9} \) | \( {65.5} \pm {0.9} \) | \( {46.6} \pm {1.0} \) | \( {38.9} \pm {0.5} \) | 62.7 |
| DivCAM-S | (本方法) | \( {77.8} \pm {0.3} \) | \( {85.4} \pm {0.2} \) | \( {65.2} \pm {0.3} \) | \( {48.0} \pm {1.2} \) | \( {40.7} \pm {0.0} \) | 63.4 |
| D-TRANSFORMERS | (本方法) | \( {78.7} \pm {0.5} \) | \( {84.2} \pm {0.1} \) | - | \( {42.9} \pm {1.1} \) | - | - |
| ERM* | [187] | \( {77.6} \pm {0.3} \) | \( {86.7} \pm {0.3} \) | \( {66.4} \pm {0.5} \) | \( {53.0} \pm {0.3} \) | \( {41.3} \pm {0.1} \) | 65.0 |
| IRM* | [9] | \( {76.9} \pm {0.6} \) | \( {84.5} \pm {1.1} \) | \( {63.0} \pm {2.7} \) | \( {50.5} \pm {0.7} \) | \( {28.0} \pm {5.1} \) | 60.5 |
| GroupDRO* | [159] | \( {77.4} \pm {0.5} \) | \( {87.1} \pm {0.1} \) | \( {66.2} \pm {0.6} \) | \( {52.4} \pm {0.1} \) | \( {33.4} \pm {0.3} \) | 63.3 |
| Mixup* | [203] | \( {78.1} \pm {0.3} \) | \( {86.8} \pm {0.3} \) | \( {68.0} \pm {0.2} \) | \( {54.4} \pm {0.3} \) | \( {39.6} \pm {0.1} \) | 65.3 |
| MLDG* | [108] | \( {77.5} \pm {0.1} \) | \( {86.8} \pm {0.4} \) | \( {66.6} \pm {0.3} \) | \( {52.0} \pm {0.1} \) | \( {41.6} \pm {0.1} \) | 64.9 |
| CORAL* | [179] | \( {77.7} \pm {0.2} \) | \( {87.1} \pm {0.5} \) | \( {68.4} \pm {0.2} \) | \( {52.8} \pm {0.2} \) | \( {41.8} \pm {0.1} \) | 65.5 |
| MMD* | [112] | \( {77.9} \pm {0.1} \) | \( {87.2} \pm {0.1} \) | \( {66.2} \pm {0.3} \) | \( {52.0} \pm {0.4} \) | \( {23.5} \pm {9.4} \) | 61.3 |
| DANN* | [61] | \( {79.7} \pm {0.5} \) | \( {85.2} \pm {0.2} \) | \( {65.3} \pm {0.8} \) | \( {50.6} \pm {0.4} \) | \( {38.3} \pm {0.1} \) | 63.8 |
| CDANN* | [116] | \( {79.9} \pm {0.2} \) | \( {85.8} \pm {0.8} \) | \( {65.3} \pm {0.5} \) | \( {50.8} \pm {0.6} \) | \( {38.5} \pm {0.2} \) | 64.0 |
| MTL* | [19] | \( {77.7} \pm {0.5} \) | \( {86.7} \pm {0.2} \) | \( {66.5} \pm {0.4} \) | \( {52.2} \pm {0.4} \) | \( {40.8} \pm {0.1} \) | 64.7 |
| SagNet* | [135] | \( {77.6} \pm {0.1} \) | \( {86.4} \pm {0.4} \) | \( {67.5} \pm {0.2} \) | \( {52.5} \pm {0.4} \) | \( {40.8} \pm {0.2} \) | 64.9 |
| ARM* | [212] | \( {77.8} \pm {0.3} \) | \( {85.8} \pm {0.2} \) | \( {64.8} \pm {0.4} \) | \( {51.2} \pm {0.5} \) | \( {36.0} \pm {0.2} \) | 63.1 |
| VREx* | [101] | \( {78.1} \pm {0.2} \) | \( {87.2} \pm {0.6} \) | \( {65.7} \pm {0.3} \) | \( {51.4} \pm {0.5} \) | \( {30.1} \pm {3.7} \) | 62.5 |
| RSC* | [86] | \( {77.8} \pm {0.6} \) | \( {86.2} \pm {0.5} \) | \( {66.5} \pm {0.6} \) | \( {52.1} \pm {0.2} \) | \( {38.9} \pm {0.6} \) | 64.3 |
| DivCAM-S* | (本方法) | \( {78.1} \pm {0.6} \) | \( {87.2} \pm {0.1} \) | \( {65.2} \pm {0.5} \) | \( {51.3} \pm {0.5} \) | \( {41.0} \pm {0.0} \) | 64.6 |
| D-TRANSFORMERS* | (本方法) | \( {77.7} \pm {0.1} \) | \( {86.9} \pm {0.3} \) | - | \( {52.4} \pm {0.8} \) | - | - |
Table 5.2: Performance comparison across datasets using training-domain validation (top) and oracle validation denoted with * (bottom). We use a ResNet-50 backbone, optimize with ADAM, and follow the distributions specified in DOMAINBED. Only RSC and our methods have been added as part of this work, the other baselines are taken from DomAINBED.
表5.2:使用训练域验证(上方)和用*标记的oracle验证(下方)在各数据集上的性能比较。我们采用ResNet-50主干网络,使用ADAM优化器,并遵循DOMAINBED中指定的分布。仅RSC和我们的方法作为本工作新增,其他基线方法均取自DOMAINBED。
D-TRANSFORMERS in DOMAINBED. In Table 5.2 the results for D-TRANSFORMERS on OfficeHome and DomainNet are omited due to computational constraints but we would expect the performances to be on-par if not better compared to the other methods, as these datasets do not share the same prototype extraction difficulties from TerraIncognita.
DOMAINBED中的D-TRANSFORMERS。由于计算资源限制,表5.2中省略了D-TRANSFORMERS在OfficeHome和DomainNet上的结果,但我们预计其性能至少与其他方法持平,甚至更优,因为这些数据集不存在TerraIncognita中原型提取的困难。
In this section, we are going to look at different ablations of the presented methods and how the individual components impact the performance. In particular, for DIVCAM-S, we analyze different methods of resetting the masks (mask batching) in Section 5.4.2 and see how additional methods that are supposed to improve the underlying class activation maps impact performance in Section 5.4.3. On top of that, for ProDrop, we evaluate the effect of self-challenging for different negative weights in Section 5.4.4, as well as the impact of the additional intra-loss factor in Section 5.4.5.
本节将探讨所提出方法的不同消融实验及各组成部分对性能的影响。具体来说,对于DIVCAM-S,我们在5.4.2节分析了不同的掩码重置(掩码批处理)方法,在5.4.3节考察了旨在提升基础类别激活图的额外方法对性能的影响。此外,对于ProDrop,我们在5.4.4节评估了不同负权重下自我挑战机制的效果,在5.4.5节分析了额外的类内损失因子的影响。
For the mask batching ablation study we use ADAM [98] and the distributions from Table 5.4. When the batch drop factor is scheduled, we use an increasing linear schedule while the learning rate is always scheduled with a step-decay which decays the learning rate by factor 0.1 at epoch 80/100 .
在掩码批处理消融实验中,我们使用ADAM [98]优化器和表5.4中的分布。当批量丢弃因子采用调度时,使用线性递增调度,而学习率始终采用阶梯衰减调度,在第80/100个epoch时将学习率降低0.1倍。
| Algorithm | Ref. | Backbone | \( \mathbf{P} \) | A | C | S | Avg. |
| BASELINE | [28] | ResNet-18 | 95.73 | 77.85 | 74.86 | 67.74 | 79.05 |
| MASF | [46] | ResNet-18 | 94.99 | 80.29 | 77.17 | 71.69 | 81.03 |
| EPI-FCR | [110] | ResNet-18 | 93.90 | 82.10 | 77.00 | 73.00 | 81.50 |
| JIGEN | [28] | ResNet-18 | 96.03 | 79.42 | 75.25 | 71.35 | 80.51 |
| MetaReg | [15] | ResNet-18 | 95.50 | 83.70 | 77.20 | 70.30 | 81.70 |
| RSC (reported) | [86] | ResNet-18 | 95.99 | 83.43 | 80.31 | 80.85 | 85.15 |
| RSC (reproduced) | [86] | ResNet-18 | 93.73 | 80.41 | 77.53 | 80.79 | 83.12 |
| DivCAM-S | (ours) | ResNet-18 | 96.11 | 80.27 | 77.82 | 82.18 | 84.10 |
| 算法 | 参考 | 主干网络 | \( \mathbf{P} \) | A | C | S | 平均 |
| 基线 | [28] | ResNet-18 | 95.73 | 77.85 | 74.86 | 67.74 | 79.05 |
| MASF | [46] | ResNet-18 | 94.99 | 80.29 | 77.17 | 71.69 | 81.03 |
| EPI-FCR | [110] | ResNet-18 | 93.90 | 82.10 | 77.00 | 73.00 | 81.50 |
| JIGEN | [28] | ResNet-18 | 96.03 | 79.42 | 75.25 | 71.35 | 80.51 |
| MetaReg | [15] | ResNet-18 | 95.50 | 83.70 | 77.20 | 70.30 | 81.70 |
| RSC(报告值) | [86] | ResNet-18 | 95.99 | 83.43 | 80.31 | 80.85 | 85.15 |
| RSC(复现值) | [86] | ResNet-18 | 93.73 | 80.41 | 77.53 | 80.79 | 83.12 |
| DivCAM-S | (本方法) | ResNet-18 | 96.11 | 80.27 | 77.82 | 82.18 | 84.10 |
Table 5.3: Performance comparison for PACS outside of the DOMAINBED framework with the official data split.
表5.3:在官方数据划分下,DOMAINBED框架外PACS的性能比较。
| Hyperparameter | Distribution | |
| \( \alpha \) | learning rate | \( {\mathcal{{LU}}}_{10}\left( {-5, - 1}\right) \) |
| \( \mathcal{B} \) | batch size | \( \lfloor \mathcal{L}{U}_{2}\left( {3,9}\right) \rfloor \) |
| \( \gamma \) | weight decay | \( {\mathcal{{LU}}}_{10}\left( {-6, - 2}\right) \) |
| \( p \) | feature drop factor | 1/3 |
| \( b \) | batch drop factor | \( \mathcal{U}\left( {0,1}\right) \) |
| 超参数 | 分布 | |
| \( \alpha \) | 学习率 | \( {\mathcal{{LU}}}_{10}\left( {-5, - 1}\right) \) |
| \( \mathcal{B} \) | 批量大小 | \( \lfloor \mathcal{L}{U}_{2}\left( {3,9}\right) \rfloor \) |
| \( \gamma \) | 权重衰减 | \( {\mathcal{{LU}}}_{10}\left( {-6, - 2}\right) \) |
| \( p \) | 特征丢弃因子 | 1/3 |
| \( b \) | 批量丢弃因子 | \( \mathcal{U}\left( {0,1}\right) \) |
Table 5.4: Hyperparameters and distributions used in random search for the mask batching ablation study.
表5.4:用于掩码批处理消融研究的随机搜索超参数及其分布。
For the mask ablation study, we use ADAM [98] and the distributions from Table 5.5. When the batch drop factor is scheduled, we use an increasing linear schedule while the learning rate is not scheduled. This corresponds to the tuning distributions provided in DOMAINBED which are also used for all the main results and all other ablations. If not marked otherwise, each experiment evaluates 20 hyperparameter samples, similar to what is suggested in DOMAINBED.
在掩码消融研究中,我们使用ADAM [98]优化器和表5.5中的分布。当批次丢弃因子被调度时,我们采用线性递增调度,而学习率不进行调度。这对应于DOMAINBED中提供的调优分布,该分布也用于所有主要结果和其他所有消融实验。除非另有说明,每个实验评估20个超参数样本,类似于DOMAINBED的建议。
There exist several methods how we can compute the vector
在我们的方法中,存在多种计算向量
Another option is DIVCAM-C with Equation (5.2) which computes the change in confidence on the ground truth class when applying the mask. The masked confidence after softmax is denoted as
另一种选择是DIVCAM-C,使用公式(5.2)计算应用掩码时对真实类别置信度的变化。掩码后经过softmax的置信度表示为
The last variation is DIVCAM-T where we apply the masks randomly for samples which are correctly classified. All variants can further be extended by adding a linear schedule, denoted with an additional "S",or computing
最后一种变体是DIVCAM-T,我们对正确分类的样本随机应用掩码。所有变体还可以通过添加线性调度(用额外的"S"表示)或对每个域单独计算
| Hyperparameter | Distribution | |
| \( \alpha \) | learning rate | \( {\mathcal{{LU}}}_{10}\left( {-5, - {3.5}}\right) \) |
| \( \mathcal{B} \) | batch size | \( \lfloor {\mathcal{{LU}}}_{2}\left( {3,{5.5}}\right) \rfloor \) |
| \( \gamma \) | weight decay | \( {\mathcal{{LU}}}_{10}\left( {-6, - 2}\right) \) |
| \( p \) | feature drop factor | \( \mathcal{U}\left( {{0.2},{0.5}}\right) \) |
| \( b \) | batch drop factor | \( \mathcal{U}\left( {0,1}\right) \) |
| \( {\lambda }_{1} \) | hnc factor | \( {\mathcal{{LU}}}_{10}\left( {-3, - 1}\right) \) |
| \( k \) | negative classes | num_classes -1 |
| \( {\lambda }_{tap} \) | tap factor | \( \mathcal{U}\left( {0,1}\right) \) |
| \( {\lambda }_{2} \) | adversarial factor | \( {\mathcal{{LU}}}_{10}\left( {-2,2}\right) \) |
| \( S \) | disc per gen step | \( \lfloor \mathcal{L}{\mathcal{U}}_{2}\left( {0,3}\right) \rfloor \) |
| \( \eta \) | gradient penalty | \( {\mathcal{{LU}}}_{10}\left( {-2,1}\right) \) |
| \( {\omega }_{s} \) | mlp width | 512 |
| \( {\omega }_{d} \) | mlp depth | 3 |
| \( {\omega }_{dr} \) | mlp dropout | 0.5 |
| \( {\lambda }_{3} \) | mmd factor | \( \mathcal{L}{\mathcal{U}}_{10}\left( {-1,1}\right) \) |
| 超参数 | 分布 | |
| \( \alpha \) | 学习率 | \( {\mathcal{{LU}}}_{10}\left( {-5, - {3.5}}\right) \) |
| \( \mathcal{B} \) | 批量大小 | \( \lfloor {\mathcal{{LU}}}_{2}\left( {3,{5.5}}\right) \rfloor \) |
| \( \gamma \) | 权重衰减 | \( {\mathcal{{LU}}}_{10}\left( {-6, - 2}\right) \) |
| \( p \) | 特征丢弃因子 | \( \mathcal{U}\left( {{0.2},{0.5}}\right) \) |
| \( b \) | 批量丢弃因子 | \( \mathcal{U}\left( {0,1}\right) \) |
| \( {\lambda }_{1} \) | hnc因子 | \( {\mathcal{{LU}}}_{10}\left( {-3, - 1}\right) \) |
| \( k \) | 负类 | 类别数 -1 |
| \( {\lambda }_{tap} \) | tap因子 | \( \mathcal{U}\left( {0,1}\right) \) |
| \( {\lambda }_{2} \) | 对抗因子 | \( {\mathcal{{LU}}}_{10}\left( {-2,2}\right) \) |
| \( S \) | 每生成步骤判别器次数 | \( \lfloor \mathcal{L}{\mathcal{U}}_{2}\left( {0,3}\right) \rfloor \) |
| \( \eta \) | 梯度惩罚 | \( {\mathcal{{LU}}}_{10}\left( {-2,1}\right) \) |
| \( {\omega }_{s} \) | 多层感知机宽度 | 512 |
| \( {\omega }_{d} \) | 多层感知机深度 | 3 |
| \( {\omega }_{dr} \) | 多层感知机丢弃率 | 0.5 |
| \( {\lambda }_{3} \) | 最大均值差异因子 | \( \mathcal{L}{\mathcal{U}}_{10}\left( {-1,1}\right) \) |
Table 5.5: Hyperparameters and distributions used in random search for the mask ablation study.
表5.5:用于掩码消融研究的随机搜索超参数及其分布。
We observe that adding a schedule helps in most cases, achieving the highest training domain validation performance for DIvCAM-S. Enforcing the application of masks within each domain, however, doesn't consistently improve performance and therefore we don't consider it for the final method.
我们观察到,添加调度在大多数情况下都有帮助,DIvCAM-S在训练域验证上达到了最高性能。然而,强制在每个域内应用掩码并未持续提升性能,因此我们未将其纳入最终方法。
We combine our class activation maps with other methods from domain generalization, as well as methods to boost the explainability for class activation maps from the weakly-supervised object localization literature. The results are shown in Table 5.7 where MAP + CDANN drops the self-challenging part from DIVCAM and just computes ordinary cross entropy while aligning the class activation maps.
我们将类激活图与领域泛化的其他方法结合,以及来自弱监督目标定位文献中提升类激活图可解释性的方法。结果如表5.7所示,其中MAP + CDANN去除了DIVCAM中的自我挑战部分,仅计算普通交叉熵,同时对齐类激活图。
We observe that, surprisingly, none of the methods have a positive effect for training domain validation even though some of them exhibit better performance for oracle validation. Notably, especially the CDANN approach tends to exhibit a high standard deviation for some of the domains (e.g. art) which suggests that this approach can be fine-tuned when only reporting performance on a single seed. However, since we are looking for a method which reliably provides competitive results, regardless of the used seed and without needing extensive fine-tuning, we disregard this option.
令人惊讶的是,我们观察到这些方法对训练域验证均无正面影响,尽管部分方法在oracle验证上表现更佳。值得注意的是,尤其是CDANN方法在某些域(如艺术域)表现出较高的标准差,表明该方法在仅报告单一随机种子性能时可能经过微调。然而,由于我们寻求的是一种无论随机种子如何且无需大量微调都能稳定提供竞争性结果的方法,因此我们放弃了该选项。
Table 5.8 shows the ablation results for the self-challenging addition. We observe that for most negative weights, adding self-challenging results in an performance increase. Most notably, this occurs for cases where the performance without self-challenging is very poor such as
表5.8展示了自我挑战添加的消融结果。我们观察到,对于大多数负权重,加入自我挑战会带来性能提升。尤其是在无自我挑战时性能极差的情况,如
If we look at the performance changes solely based on the different negative weights, we can't observe any consistent trends. In fact,it is very surprising that small changes such as from
如果仅根据不同负权重观察性能变化,我们无法发现任何一致的趋势。事实上,令人惊讶的是,诸如从
| Name | P | A | C | S | \( \mathbf{{Avg}.} \) |
| DIVCAM | \( {94.0} \pm {0.4} \) | \( {80.6} \pm {1.2} \) | \( {75.4} \pm {0.7} \) | \( {76.7} \pm {0.7} \) | \( {81.7} \pm {0.6} \) |
| DivCAM-S | \( {94.4} \pm {0.7} \) | \( {80.5} \pm {0.4} \) | \( {74.6} \pm {2.2} \) | \( {79.0} \pm {0.9} \) | \( {82.1} \pm {0.3} \) |
| DivCAM-D | \( {94.3} \pm {0.1} \) | \( {80.1} \pm {0.1} \) | \( {74.5} \pm {0.9} \) | \( {76.6} \pm {1.7} \) | \( {81.4} \pm {0.2} \) |
| DIVCAM-DS | \( {93.9} \pm {0.2} \) | \( {80.4} \pm {0.4} \) | \( {73.4} \pm {2.2} \) | \( {74.8} \pm {1.2} \) | \( {80.6} \pm {0.9} \) |
| DivCAM-C | \( {92.6} \pm {0.4} \) | \( {80.1} \pm {1.1} \) | \( {73.6} \pm {1.4} \) | \( {75.0} \pm {1.2} \) | \( {80.3} \pm {0.9} \) |
| DIVCAM-CS | \( {95.0} \pm {0.6} \) | \( {79.9} \pm {1.0} \) | \( {74.5} \pm {0.7} \) | \( {78.1} \pm {0.8} \) | \( {81.9} \pm {0.4} \) |
| DivCAM-DC | \( {95.1} \pm {0.4} \) | \( {79.5} \pm {1.0} \) | \( {73.7} \pm {0.9} \) | \( {75.2} \pm {1.2} \) | \( {80.9} \pm {0.4} \) |
| DIVCAM-DCS | \( {93.5} \pm {0.1} \) | \( {80.1} \pm {0.2} \) | \( {75.1} \pm {0.1} \) | \( {77.2} \pm {1.6} \) | \( {81.5} \pm {0.5} \) |
| DIVCAM-T | \( {95.0} \pm {0.3} \) | \( {80.3} \pm {0.3} \) | \( {74.8} \pm {0.8} \) | \( {75.3} \pm {1.1} \) | \( {81.4} \pm {0.4} \) |
| DIVCAM-TS | \( {95.0} \pm {0.1} \) | \( {79.9} \pm {0.8} \) | \( {72.6} \pm {1.3} \) | \( {77.1} \pm {1.4} \) | \( {81.2} \pm {0.4} \) |
| DivCAM-DT | \( {94.8} \pm {0.6} \) | \( {79.6} \pm {0.6} \) | \( {74.0} \pm {1.1} \) | \( {78.5} \pm {0.4} \) | \( {81.7} \pm {0.1} \) |
| DIVCAM-DTS | \( {95.1} \pm {0.2} \) | \( {81.5} \pm {1.3} \) | \( {75.5} \pm {0.4} \) | \( {74.9} \pm {2.0} \) | \( {81.7} \pm {0.5} \) |
| DivCAM* | \( {94.9} \pm {0.7} \) | \( {81.5} \pm {0.7} \) | \( {76.6} \pm {0.4} \) | \( {80.5} \pm {0.7} \) | \( {83.4} \pm {0.3} \) |
| DIVCAM-S* | \( {94.9} \pm {0.3} \) | \( {82.7} \pm {0.7} \) | \( {76.3} \pm {0.7} \) | \( {80.1} \pm {0.4} \) | \( {83.5} \pm {0.3} \) |
| DIVCAM-D* | \( {94.8} \pm {0.2} \) | \( {81.0} \pm {0.7} \) | \( \mathbf{{77.6} \pm {0.6}} \) | \( {79.9} \pm {0.6} \) | \( {83.3} \pm {0.3} \) |
| DIVCAM-DS* | \( {94.6} \pm {0.5} \) | \( {80.7} \pm {0.3} \) | \( {77.0} \pm {0.4} \) | \( {79.3} \pm {0.3} \) | \( {82.9} \pm {0.1} \) |
| DIVCAM-C* | \( {94.7} \pm {0.5} \) | \( {82.6} \pm {0.6} \) | \( {77.0} \pm {0.5} \) | \( {80.1} \pm {1.0} \) | \( \mathbf{{83.6} \pm {0.3}} \) |
| DIVCAM-CS* | \( {94.2} \pm {0.2} \) | \( {82.5} \pm {0.8} \) | \( {76.9} \pm {0.3} \) | \( {79.9} \pm {0.7} \) | \( {83.4} \pm {0.3} \) |
| DIVCAM-DC* | \( {94.8} \pm {0.4} \) | \( {82.0} \pm {0.4} \) | \( {76.6} \pm {0.9} \) | \( {80.1} \pm {0.4} \) | \( {83.4} \pm {0.1} \) |
| DIVCAM-DCS* | \( {94.7} \pm {0.4} \) | \( {81.0} \pm {0.3} \) | \( {77.6} \pm {0.2} \) | \( {80.3} \pm {1.3} \) | \( {83.4} \pm {0.3} \) |
| DIVCAM-T* | \( {94.5} \pm {0.4} \) | \( {81.6} \pm {0.8} \) | \( {76.7} \pm {0.2} \) | \( {79.6} \pm {0.4} \) | \( {83.1} \pm {0.4} \) |
| DIVCAM-TS* | \( {94.8} \pm {0.3} \) | \( {81.3} \pm {0.2} \) | \( {76.7} \pm {0.5} \) | \( {79.7} \pm {0.5} \) | \( {83.2} \pm {0.2} \) |
| DIVCAM-DT* | \( {94.7} \pm {0.5} \) | \( {80.9} \pm {1.1} \) | \( {77.3} \pm {0.5} \) | \( {79.9} \pm {0.6} \) | \( {83.2} \pm {0.2} \) |
| DIVCAM-DTS* | \( {94.7} \pm {0.5} \) | \( {82.1} \pm {1.0} \) | \( {76.4} \pm {0.6} \) | \( {79.5} \pm {1.2} \) | \( {83.2} \pm {0.1} \) |
| 名称 | P | A | C | S | \( \mathbf{{Avg}.} \) |
| DIVCAM | \( {94.0} \pm {0.4} \) | \( {80.6} \pm {1.2} \) | \( {75.4} \pm {0.7} \) | \( {76.7} \pm {0.7} \) | \( {81.7} \pm {0.6} \) |
| DivCAM-S | \( {94.4} \pm {0.7} \) | \( {80.5} \pm {0.4} \) | \( {74.6} \pm {2.2} \) | \( {79.0} \pm {0.9} \) | \( {82.1} \pm {0.3} \) |
| DivCAM-D | \( {94.3} \pm {0.1} \) | \( {80.1} \pm {0.1} \) | \( {74.5} \pm {0.9} \) | \( {76.6} \pm {1.7} \) | \( {81.4} \pm {0.2} \) |
| DIVCAM-DS | \( {93.9} \pm {0.2} \) | \( {80.4} \pm {0.4} \) | \( {73.4} \pm {2.2} \) | \( {74.8} \pm {1.2} \) | \( {80.6} \pm {0.9} \) |
| DivCAM-C | \( {92.6} \pm {0.4} \) | \( {80.1} \pm {1.1} \) | \( {73.6} \pm {1.4} \) | \( {75.0} \pm {1.2} \) | \( {80.3} \pm {0.9} \) |
| DIVCAM-CS | \( {95.0} \pm {0.6} \) | \( {79.9} \pm {1.0} \) | \( {74.5} \pm {0.7} \) | \( {78.1} \pm {0.8} \) | \( {81.9} \pm {0.4} \) |
| DIVCAM-DC | \( {95.1} \pm {0.4} \) | \( {79.5} \pm {1.0} \) | \( {73.7} \pm {0.9} \) | \( {75.2} \pm {1.2} \) | \( {80.9} \pm {0.4} \) |
| DIVCAM-DCS | \( {93.5} \pm {0.1} \) | \( {80.1} \pm {0.2} \) | \( {75.1} \pm {0.1} \) | \( {77.2} \pm {1.6} \) | \( {81.5} \pm {0.5} \) |
| DIVCAM-T | \( {95.0} \pm {0.3} \) | \( {80.3} \pm {0.3} \) | \( {74.8} \pm {0.8} \) | \( {75.3} \pm {1.1} \) | \( {81.4} \pm {0.4} \) |
| DIVCAM-TS | \( {95.0} \pm {0.1} \) | \( {79.9} \pm {0.8} \) | \( {72.6} \pm {1.3} \) | \( {77.1} \pm {1.4} \) | \( {81.2} \pm {0.4} \) |
| DivCAM-DT | \( {94.8} \pm {0.6} \) | \( {79.6} \pm {0.6} \) | \( {74.0} \pm {1.1} \) | \( {78.5} \pm {0.4} \) | \( {81.7} \pm {0.1} \) |
| DIVCAM-DTS | \( {95.1} \pm {0.2} \) | \( {81.5} \pm {1.3} \) | \( {75.5} \pm {0.4} \) | \( {74.9} \pm {2.0} \) | \( {81.7} \pm {0.5} \) |
| DivCAM* | \( {94.9} \pm {0.7} \) | \( {81.5} \pm {0.7} \) | \( {76.6} \pm {0.4} \) | \( {80.5} \pm {0.7} \) | \( {83.4} \pm {0.3} \) |
| DIVCAM-S* | \( {94.9} \pm {0.3} \) | \( {82.7} \pm {0.7} \) | \( {76.3} \pm {0.7} \) | \( {80.1} \pm {0.4} \) | \( {83.5} \pm {0.3} \) |
| DIVCAM-D* | \( {94.8} \pm {0.2} \) | \( {81.0} \pm {0.7} \) | \( \mathbf{{77.6} \pm {0.6}} \) | \( {79.9} \pm {0.6} \) | \( {83.3} \pm {0.3} \) |
| DIVCAM-DS* | \( {94.6} \pm {0.5} \) | \( {80.7} \pm {0.3} \) | \( {77.0} \pm {0.4} \) | \( {79.3} \pm {0.3} \) | \( {82.9} \pm {0.1} \) |
| DIVCAM-C* | \( {94.7} \pm {0.5} \) | \( {82.6} \pm {0.6} \) | \( {77.0} \pm {0.5} \) | \( {80.1} \pm {1.0} \) | \( \mathbf{{83.6} \pm {0.3}} \) |
| DIVCAM-CS* | \( {94.2} \pm {0.2} \) | \( {82.5} \pm {0.8} \) | \( {76.9} \pm {0.3} \) | \( {79.9} \pm {0.7} \) | \( {83.4} \pm {0.3} \) |
| DIVCAM-DC* | \( {94.8} \pm {0.4} \) | \( {82.0} \pm {0.4} \) | \( {76.6} \pm {0.9} \) | \( {80.1} \pm {0.4} \) | \( {83.4} \pm {0.1} \) |
| DIVCAM-DCS* | \( {94.7} \pm {0.4} \) | \( {81.0} \pm {0.3} \) | \( {77.6} \pm {0.2} \) | \( {80.3} \pm {1.3} \) | \( {83.4} \pm {0.3} \) |
| DIVCAM-T* | \( {94.5} \pm {0.4} \) | \( {81.6} \pm {0.8} \) | \( {76.7} \pm {0.2} \) | \( {79.6} \pm {0.4} \) | \( {83.1} \pm {0.4} \) |
| DIVCAM-TS* | \( {94.8} \pm {0.3} \) | \( {81.3} \pm {0.2} \) | \( {76.7} \pm {0.5} \) | \( {79.7} \pm {0.5} \) | \( {83.2} \pm {0.2} \) |
| DIVCAM-DT* | \( {94.7} \pm {0.5} \) | \( {80.9} \pm {1.1} \) | \( {77.3} \pm {0.5} \) | \( {79.9} \pm {0.6} \) | \( {83.2} \pm {0.2} \) |
| DIVCAM-DTS* | \( {94.7} \pm {0.5} \) | \( {82.1} \pm {1.0} \) | \( {76.4} \pm {0.6} \) | \( {79.5} \pm {1.2} \) | \( {83.2} \pm {0.1} \) |
Table 5.6: Ablation study for the DIVCAM mask batching on the PACS dataset using training-domain validation (top) and oracle validation denoted with * (bottom). We use a ResNet-18 backbone, schedules and distributions from Section 5.4.1, 25 hyperparameter samples, and 3 split seeds for standard deviations.
表5.6:在PACS数据集上使用训练域验证(上方)和用*标记的oracle验证(下方)进行DIVCAM掩码批处理的消融研究。我们使用ResNet-18主干网络,调度和分布来自第5.4.1节,25个超参数样本,以及3个分割随机种子计算标准差。
Table 5.9 shows the ablation results for different intra factor strengths
表5.9展示了不同内部因子强度
Even though we experimented with different weighting of the individual distance metrics as well as the overall loss, we observe no consistent performance improvements on top of self-challenging across testing environments and data splits. The observations from Section 5.4.5, Figure 4.6, and Appendix B suggest that self-challenging inherently already enforces the desired properties up to the near-optimal extent such that the additional loss term does not provide any consistent further benefits.
尽管我们尝试了对各个距离度量以及整体损失的不同加权,但在测试环境和数据划分中,未观察到在自我挑战基础上有一致的性能提升。第5.4.5节、图4.6及附录B的观察表明,自我挑战本质上已近乎最优地强制实现了期望属性,因此额外的损失项并未带来持续的额外收益。
| Name | \( \mathbf{P} \) | A | C | S | Avg. |
| DIVCAM | \( {97.6} \pm {0.4} \) | \( {85.2} \pm {0.8} \) | \( {80.5} \pm {0.7} \) | \( {78.3} \pm {0.8} \) | \( {85.4} \pm {0.5} \) |
| DivCAM-S | \( {97.3} \pm {0.4} \) | \( {86.2} \pm {1.4} \) | \( {79.1} \pm {2.2} \) | \( {79.2} \pm {0.1} \) | \( {85.4} \pm {0.2} \) |
| DIVCAM-S + TAP | \( {96.9} \pm {0.1} \) | \( {85.1} \pm {1.5} \) | \( {78.7} \pm {0.4} \) | \( {75.3} \pm {0.6} \) | \( {84.0} \pm {0.4} \) |
| DivCAM-S + HNC | \( {97.2} \pm {0.3} \) | \( {87.2} \pm {0.9} \) | \( {79.2} \pm {0.6} \) | \( {71.7} \pm {3.1} \) | \( {83.8} \pm {0.4} \) |
| DivCAM-S + CDANN | \( {97.5} \pm {0.4} \) | \( {85.2} \pm {2.8} \) | \( {78.3} \pm {2.0} \) | \( {74.8} \pm {0.9} \) | \( {84.0} \pm {1.5} \) |
| DIVCAM-S + MMD | \( {97.0} \pm {0.2} \) | \( {85.4} \pm {1.0} \) | \( {81.5} \pm {0.4} \) | \( {75.8} \pm {3.5} \) | \( {84.9} \pm {1.1} \) |
| CAM + CDANN | \( {97.2} \pm {0.3} \) | \( {86.7} \pm {0.5} \) | \( {77.3} \pm {1.7} \) | \( {71.5} \pm {1.3} \) | \( {83.2} \pm {0.8} \) |
| DivCAM* | \( {96.2} \pm {1.2} \) | \( {87.0} \pm {0.5} \) | \( {82.0} \pm {0.9} \) | \( {80.8} \pm {0.6} \) | \( {86.5} \pm {0.1} \) |
| DIVCAM-S* | \( {97.2} \pm {0.3} \) | \( {86.5} \pm {0.4} \) | \( {83.0} \pm {0.5} \) | \( {82.2} \pm {0.1} \) | \( {87.2} \pm {0.1} \) |
| DIVCAM-S + TAP* | \( {97.3} \pm {0.3} \) | \( {87.2} \pm {0.8} \) | \( \mathbf{{83.2} \pm {0.8}} \) | \( {82.8} \pm {0.2} \) | \( {87.6} \pm {0.0} \) |
| DIVCAM-S + HNC* | \( {97.3} \pm {0.2} \) | \( {87.4} \pm {0.5} \) | \( {81.4} \pm {0.6} \) | \( {79.7} \pm {1.1} \) | \( {86.5} \pm {0.4} \) |
| DIVCAM-S + CDANN* | \( {97.3} \pm {0.5} \) | \( {85.9} \pm {1.2} \) | \( {80.6} \pm {0.4} \) | \( {80.9} \pm {0.4} \) | \( {86.2} \pm {0.2} \) |
| DivCAM-S + MMD* | \( {97.3} \pm {0.5} \) | \( {86.8} \pm {0.7} \) | \( {83.2} \pm {0.4} \) | \( {80.9} \pm {0.7} \) | \( {87.1} \pm {0.4} \) |
| CAM + CDANN* | \( {97.2} \pm {0.4} \) | \( {86.7} \pm {0.5} \) | \( {81.9} \pm {0.2} \) | \( {80.6} \pm {0.7} \) | \( {86.6} \pm {0.2} \) |
| 名称 | \( \mathbf{P} \) | A | C | S | 平均 |
| DIVCAM | \( {97.6} \pm {0.4} \) | \( {85.2} \pm {0.8} \) | \( {80.5} \pm {0.7} \) | \( {78.3} \pm {0.8} \) | \( {85.4} \pm {0.5} \) |
| DivCAM-S | \( {97.3} \pm {0.4} \) | \( {86.2} \pm {1.4} \) | \( {79.1} \pm {2.2} \) | \( {79.2} \pm {0.1} \) | \( {85.4} \pm {0.2} \) |
| DIVCAM-S + TAP | \( {96.9} \pm {0.1} \) | \( {85.1} \pm {1.5} \) | \( {78.7} \pm {0.4} \) | \( {75.3} \pm {0.6} \) | \( {84.0} \pm {0.4} \) |
| DivCAM-S + HNC | \( {97.2} \pm {0.3} \) | \( {87.2} \pm {0.9} \) | \( {79.2} \pm {0.6} \) | \( {71.7} \pm {3.1} \) | \( {83.8} \pm {0.4} \) |
| DivCAM-S + CDANN | \( {97.5} \pm {0.4} \) | \( {85.2} \pm {2.8} \) | \( {78.3} \pm {2.0} \) | \( {74.8} \pm {0.9} \) | \( {84.0} \pm {1.5} \) |
| DIVCAM-S + MMD | \( {97.0} \pm {0.2} \) | \( {85.4} \pm {1.0} \) | \( {81.5} \pm {0.4} \) | \( {75.8} \pm {3.5} \) | \( {84.9} \pm {1.1} \) |
| CAM + CDANN | \( {97.2} \pm {0.3} \) | \( {86.7} \pm {0.5} \) | \( {77.3} \pm {1.7} \) | \( {71.5} \pm {1.3} \) | \( {83.2} \pm {0.8} \) |
| DivCAM* | \( {96.2} \pm {1.2} \) | \( {87.0} \pm {0.5} \) | \( {82.0} \pm {0.9} \) | \( {80.8} \pm {0.6} \) | \( {86.5} \pm {0.1} \) |
| DIVCAM-S* | \( {97.2} \pm {0.3} \) | \( {86.5} \pm {0.4} \) | \( {83.0} \pm {0.5} \) | \( {82.2} \pm {0.1} \) | \( {87.2} \pm {0.1} \) |
| DIVCAM-S + TAP* | \( {97.3} \pm {0.3} \) | \( {87.2} \pm {0.8} \) | \( \mathbf{{83.2} \pm {0.8}} \) | \( {82.8} \pm {0.2} \) | \( {87.6} \pm {0.0} \) |
| DIVCAM-S + HNC* | \( {97.3} \pm {0.2} \) | \( {87.4} \pm {0.5} \) | \( {81.4} \pm {0.6} \) | \( {79.7} \pm {1.1} \) | \( {86.5} \pm {0.4} \) |
| DIVCAM-S + CDANN* | \( {97.3} \pm {0.5} \) | \( {85.9} \pm {1.2} \) | \( {80.6} \pm {0.4} \) | \( {80.9} \pm {0.4} \) | \( {86.2} \pm {0.2} \) |
| DivCAM-S + MMD* | \( {97.3} \pm {0.5} \) | \( {86.8} \pm {0.7} \) | \( {83.2} \pm {0.4} \) | \( {80.9} \pm {0.7} \) | \( {87.1} \pm {0.4} \) |
| CAM + CDANN* | \( {97.2} \pm {0.4} \) | \( {86.7} \pm {0.5} \) | \( {81.9} \pm {0.2} \) | \( {80.6} \pm {0.7} \) | \( {86.6} \pm {0.2} \) |
Table 5.7: Ablation study for the DIVCAM masks on the PACS dataset using training-domain validation (top) and oracle validation denoted with * (bottom). We use a ResNet-50 backbone, schedules and distributions from Section 5.4.1, 20 hyperparameter samples, and 3 split seeds for standard deviations. Results are directly integratable in Table 5.2 as we use the same tuning protocol provided in DomainBED.
表5.7:在PACS数据集上使用训练域验证(上方)和带*标记的oracle验证(下方)对DIVCAM掩码的消融研究。我们采用ResNet-50骨干网络,调度和分布来自第5.4.1节,20个超参数样本,以及3个划分随机种子计算标准差。结果可直接整合入表5.2,因为我们使用了DomainBED中提供的相同调优协议。
| Weight | SC | P | A | C | S | Avg. |
| 0.0 | ✘ | \( {93.2} \pm {0.0} \) | \( {80.4} \pm {1.0} \) | \( {73.7} \pm {0.4} \) | \( {72.6} \pm {2.6} \) | \( {80.0} \pm {0.7} \) |
| -0.1 | ✘ | \( {94.3} \pm {0.3} \) | \( {78.8} \pm {0.3} \) | \( {74.4} \pm {0.7} \) | \( {75.3} \pm {1.6} \) | \( {80.7} \pm {0.4} \) |
| -0.2 | ✘ | \( {93.2} \pm {0.5} \) | \( {76.8} \pm {1.5} \) | \( {72.9} \pm {0.1} \) | \( {71.8} \pm {1.0} \) | \( {78.7} \pm {0.5} \) |
| -0.3 | ✘ | \( {94.0} \pm {0.4} \) | \( {79.8} \pm {1.0} \) | \( {75.6} \pm {1.5} \) | \( {73.9} \pm {1.0} \) | \( {80.8} \pm {0.1} \) |
| -0.4 | ✘ | \( {93.6} \pm {0.1} \) | \( {79.8} \pm {0.5} \) | \( {74.3} \pm {1.3} \) | \( {75.7} \pm {2.3} \) | \( {80.8} \pm {0.5} \) |
| -0.5 | ✘ | \( {93.0} \pm {0.9} \) | \( {79.4} \pm {1.6} \) | \( {73.2} \pm {0.9} \) | \( {75.5} \pm {1.0} \) | \( {80.3} \pm {0.4} \) |
| -1.0 | ✘ | \( {94.2} \pm {0.4} \) | \( {80.2} \pm {1.1} \) | \( {72.7} \pm {1.4} \) | \( {68.6} \pm {0.6} \) | \( {78.9} \pm {0.2} \) |
| -2.0 | ✘ | \( {94.7} \pm {0.4} \) | \( {78.7} \pm {0.5} \) | \( {75.5} \pm {1.0} \) | \( {71.1} \pm {2.2} \) | \( {80.0} \pm {0.7} \) |
| \( {0.0} \rightarrow - {1.0} \) | ✘ | \( {94.3} \pm {0.5} \) | \( {79.5} \pm {0.6} \) | \( {74.2} \pm {0.3} \) | \( {72.2} \pm {2.1} \) | \( {80.0} \pm {0.4} \) |
| 0.0 | ✓ | \( {93.4} \pm {0.6} \) | \( {80.5} \pm {0.8} \) | \( {75.6} \pm {0.1} \) | \( {74.3} \pm {2.0} \) | \( {81.0} \pm {0.4} \) |
| -0.1 | ✓ | \( {93.7} \pm {0.3} \) | \( {83.2} \pm {1.4} \) | \( {75.9} \pm {1.2} \) | \( {71.1} \pm {1.9} \) | \( {81.0} \pm {0.8} \) |
| -0.2 | ✓ | \( {93.2} \pm {0.2} \) | \( {81.0} \pm {0.7} \) | \( {73.9} \pm {0.7} \) | \( {75.0} \pm {0.6} \) | \( {80.8} \pm {0.1} \) |
| -0.3 | ✓ | \( {93.4} \pm {0.8} \) | \( {81.4} \pm {0.4} \) | \( {71.3} \pm {1.2} \) | \( {76.9} \pm {0.9} \) | \( {80.7} \pm {0.2} \) |
| -0.4 | ✓ | \( {94.0} \pm {0.3} \) | \( {81.7} \pm {0.9} \) | \( {72.9} \pm {0.4} \) | \( {73.5} \pm {1.0} \) | \( {80.5} \pm {0.2} \) |
| -0.5 | ✓ | \( {93.5} \pm {0.7} \) | \( {80.7} \pm {1.6} \) | \( {71.6} \pm {1.4} \) | \( {73.6} \pm {1.5} \) | \( {79.8} \pm {0.9} \) |
| -1.0 | ✓ | \( {94.6} \pm {0.2} \) | \( {81.6} \pm {1.2} \) | \( {72.9} \pm {0.4} \) | \( {77.0} \pm {1.6} \) | \( {81.5} \pm {0.2} \) |
| -2.0 | ✓ | \( {94.0} \pm {0.5} \) | \( {79.5} \pm {1.2} \) | \( {76.4} \pm {0.4} \) | \( {73.9} \pm {1.7} \) | \( {80.9} \pm {0.6} \) |
| \( {0.0} \rightarrow - {1.0} \) | ✓ | \( {94.1} \pm {0.4} \) | \( {79.2} \pm {0.6} \) | \( {74.1} \pm {1.1} \) | \( {71.9} \pm {0.2} \) | \( {79.8} \pm {0.3} \) |
| 重量 | SC | P | A | C | S | 平均 |
| 0.0 | ✘ | \( {93.2} \pm {0.0} \) | \( {80.4} \pm {1.0} \) | \( {73.7} \pm {0.4} \) | \( {72.6} \pm {2.6} \) | \( {80.0} \pm {0.7} \) |
| -0.1 | ✘ | \( {94.3} \pm {0.3} \) | \( {78.8} \pm {0.3} \) | \( {74.4} \pm {0.7} \) | \( {75.3} \pm {1.6} \) | \( {80.7} \pm {0.4} \) |
| -0.2 | ✘ | \( {93.2} \pm {0.5} \) | \( {76.8} \pm {1.5} \) | \( {72.9} \pm {0.1} \) | \( {71.8} \pm {1.0} \) | \( {78.7} \pm {0.5} \) |
| -0.3 | ✘ | \( {94.0} \pm {0.4} \) | \( {79.8} \pm {1.0} \) | \( {75.6} \pm {1.5} \) | \( {73.9} \pm {1.0} \) | \( {80.8} \pm {0.1} \) |
| -0.4 | ✘ | \( {93.6} \pm {0.1} \) | \( {79.8} \pm {0.5} \) | \( {74.3} \pm {1.3} \) | \( {75.7} \pm {2.3} \) | \( {80.8} \pm {0.5} \) |
| -0.5 | ✘ | \( {93.0} \pm {0.9} \) | \( {79.4} \pm {1.6} \) | \( {73.2} \pm {0.9} \) | \( {75.5} \pm {1.0} \) | \( {80.3} \pm {0.4} \) |
| -1.0 | ✘ | \( {94.2} \pm {0.4} \) | \( {80.2} \pm {1.1} \) | \( {72.7} \pm {1.4} \) | \( {68.6} \pm {0.6} \) | \( {78.9} \pm {0.2} \) |
| -2.0 | ✘ | \( {94.7} \pm {0.4} \) | \( {78.7} \pm {0.5} \) | \( {75.5} \pm {1.0} \) | \( {71.1} \pm {2.2} \) | \( {80.0} \pm {0.7} \) |
| \( {0.0} \rightarrow - {1.0} \) | ✘ | \( {94.3} \pm {0.5} \) | \( {79.5} \pm {0.6} \) | \( {74.2} \pm {0.3} \) | \( {72.2} \pm {2.1} \) | \( {80.0} \pm {0.4} \) |
| 0.0 | ✓ | \( {93.4} \pm {0.6} \) | \( {80.5} \pm {0.8} \) | \( {75.6} \pm {0.1} \) | \( {74.3} \pm {2.0} \) | \( {81.0} \pm {0.4} \) |
| -0.1 | ✓ | \( {93.7} \pm {0.3} \) | \( {83.2} \pm {1.4} \) | \( {75.9} \pm {1.2} \) | \( {71.1} \pm {1.9} \) | \( {81.0} \pm {0.8} \) |
| -0.2 | ✓ | \( {93.2} \pm {0.2} \) | \( {81.0} \pm {0.7} \) | \( {73.9} \pm {0.7} \) | \( {75.0} \pm {0.6} \) | \( {80.8} \pm {0.1} \) |
| -0.3 | ✓ | \( {93.4} \pm {0.8} \) | \( {81.4} \pm {0.4} \) | \( {71.3} \pm {1.2} \) | \( {76.9} \pm {0.9} \) | \( {80.7} \pm {0.2} \) |
| -0.4 | ✓ | \( {94.0} \pm {0.3} \) | \( {81.7} \pm {0.9} \) | \( {72.9} \pm {0.4} \) | \( {73.5} \pm {1.0} \) | \( {80.5} \pm {0.2} \) |
| -0.5 | ✓ | \( {93.5} \pm {0.7} \) | \( {80.7} \pm {1.6} \) | \( {71.6} \pm {1.4} \) | \( {73.6} \pm {1.5} \) | \( {79.8} \pm {0.9} \) |
| -1.0 | ✓ | \( {94.6} \pm {0.2} \) | \( {81.6} \pm {1.2} \) | \( {72.9} \pm {0.4} \) | \( {77.0} \pm {1.6} \) | \( {81.5} \pm {0.2} \) |
| -2.0 | ✓ | \( {94.0} \pm {0.5} \) | \( {79.5} \pm {1.2} \) | \( {76.4} \pm {0.4} \) | \( {73.9} \pm {1.7} \) | \( {80.9} \pm {0.6} \) |
| \( {0.0} \rightarrow - {1.0} \) | ✓ | \( {94.1} \pm {0.4} \) | \( {79.2} \pm {0.6} \) | \( {74.1} \pm {1.1} \) | \( {71.9} \pm {0.2} \) | \( {79.8} \pm {0.3} \) |
Table 5.8: Performance comparison for different negative class weights on the PACS dataset without (top) and with self-challenging (bottom) using training-domain validation and a ResNet-18 backbone. The feature and batch drop factors are kept constant with
表5.8:在PACS数据集上,不使用(上方)和使用自我挑战(下方)时,不同负类权重的性能比较,采用训练域验证和ResNet-18骨干网络。特征和批次丢弃因子保持恒定,分别为
| Intra factor \( {\lambda }_{6} \) | SC | \( \mathbf{P} \) | A | C | S | \( \mathbf{{Avg}.} \) |
| 0.0 | ✘ | \( {94.2} \pm {0.4} \) | \( {80.2} \pm {1.1} \) | \( {72.7} \pm {1.4} \) | \( {68.6} \pm {0.6} \) | \( {78.9} \pm {0.2} \) |
| -0.1 | ✘ | \( {94.1} \pm {0.4} \) | \( {81.2} \pm {1.2} \) | \( {73.6} \pm {0.8} \) | \( {75.2} \pm {2.4} \) | \( {81.0} \pm {0.6} \) |
| -0.2 | ✘ | \( {94.5} \pm {0.4} \) | \( {81.5} \pm {1.1} \) | \( {74.4} \pm {1.6} \) | \( {74.4} \pm {1.1} \) | \( {81.2} \pm {0.5} \) |
| -0.5 | ✘ | \( {92.9} \pm {0.8} \) | \( {82.4} \pm {0.4} \) | \( {73.5} \pm {1.6} \) | \( {73.4} \pm {2.0} \) | \( {80.6} \pm {0.7} \) |
| -1.0 | ✘ | \( {94.7} \pm {0.4} \) | \( {80.4} \pm {0.6} \) | \( {73.9} \pm {0.9} \) | \( {75.2} \pm {1.8} \) | \( {81.1} \pm {0.7} \) |
| 0.0 | ✓ | \( {94.6} \pm {0.2} \) | \( {81.6} \pm {1.2} \) | \( {72.9} \pm {0.4} \) | \( {77.0} \pm {1.6} \) | \( {81.5} \pm {0.2} \) |
| -0.1 | ✓ | \( {94.6} \pm {0.3} \) | \( {82.6} \pm {0.9} \) | \( {72.2} \pm {0.5} \) | \( {75.4} \pm {0.1} \) | \( {81.2} \pm {0.3} \) |
| -0.2 | ✓ | \( {94.1} \pm {0.1} \) | \( {81.9} \pm {0.2} \) | \( {73.3} \pm {0.7} \) | \( {75.8} \pm {0.9} \) | \( {81.2} \pm {0.1} \) |
| -0.3 | ✓ | \( {94.8} \pm {0.2} \) | \( {81.5} \pm {0.1} \) | \( {75.1} \pm {0.0} \) | \( {74.5} \pm {0.4} \) | \( {81.5} \pm {0.2} \) |
| -0.4 | ✓ | \( {93.9} \pm {0.5} \) | \( {82.4} \pm {0.2} \) | \( {74.3} \pm {1.3} \) | \( {76.1} \pm {1.1} \) | \( {81.7} \pm {0.2} \) |
| -0.5 | ✓ | \( {93.6} \pm {0.6} \) | \( {82.1} \pm {0.9} \) | \( {76.4} \pm {0.9} \) | \( {76.3} \pm {0.6} \) | \( {82.1} \pm {0.6} \) |
| -0.6 | ✓ | \( {93.8} \pm {0.6} \) | \( {82.1} \pm {0.3} \) | \( {75.7} \pm {0.2} \) | \( {73.1} \pm {3.1} \) | \( {81.2} \pm {1.0} \) |
| -0.7 | ✓ | \( {94.0} \pm {0.4} \) | \( {82.9} \pm {1.2} \) | \( {74.1} \pm {0.5} \) | \( {76.3} \pm {1.0} \) | \( {81.8} \pm {0.4} \) |
| -0.8 | ✓ | \( {94.1} \pm {0.5} \) | \( {81.8} \pm {0.3} \) | \( {75.5} \pm {0.2} \) | \( {72.7} \pm {0.3} \) | \( {81.0} \pm {0.1} \) |
| -1.0 | ✓ | \( {95.1} \pm {0.6} \) | \( {80.7} \pm {1.5} \) | \( {74.5} \pm {1.1} \) | \( {73.7} \pm {1.2} \) | \( {81.0} \pm {0.4} \) |
| -2.0 | ✓ | \( {94.3} \pm {0.3} \) | \( {82.9} \pm {0.5} \) | \( {75.1} \pm {1.0} \) | \( {75.0} \pm {1.1} \) | \( {81.8} \pm {0.5} \) |
| -3.0 | ✓ | \( {94.5} \pm {0.3} \) | \( {79.8} \pm {1.0} \) | \( {74.9} \pm {1.0} \) | \( {71.2} \pm {0.4} \) | \( {80.1} \pm {0.3} \) |
| -10.0 | ✓ | \( {90.8} \pm {3.7} \) | \( {80.2} \pm {0.3} \) | \( {72.9} \pm {0.1} \) | \( {75.0} \pm {1.8} \) | \( {79.7} \pm {0.6} \) |
| -100.0 | ✓ | \( {72.0} \pm {11} \) . | \( {76.3} \pm {2.2} \) | \( {73.0} \pm {1.1} \) | \( {69.1} \pm {2.6} \) | \( {72.6} \pm {4.0} \) |
| 内因\( {\lambda }_{6} \) | SC | \( \mathbf{P} \) | A | C | S | \( \mathbf{{Avg}.} \) |
| 0.0 | ✘ | \( {94.2} \pm {0.4} \) | \( {80.2} \pm {1.1} \) | \( {72.7} \pm {1.4} \) | \( {68.6} \pm {0.6} \) | \( {78.9} \pm {0.2} \) |
| -0.1 | ✘ | \( {94.1} \pm {0.4} \) | \( {81.2} \pm {1.2} \) | \( {73.6} \pm {0.8} \) | \( {75.2} \pm {2.4} \) | \( {81.0} \pm {0.6} \) |
| -0.2 | ✘ | \( {94.5} \pm {0.4} \) | \( {81.5} \pm {1.1} \) | \( {74.4} \pm {1.6} \) | \( {74.4} \pm {1.1} \) | \( {81.2} \pm {0.5} \) |
| -0.5 | ✘ | \( {92.9} \pm {0.8} \) | \( {82.4} \pm {0.4} \) | \( {73.5} \pm {1.6} \) | \( {73.4} \pm {2.0} \) | \( {80.6} \pm {0.7} \) |
| -1.0 | ✘ | \( {94.7} \pm {0.4} \) | \( {80.4} \pm {0.6} \) | \( {73.9} \pm {0.9} \) | \( {75.2} \pm {1.8} \) | \( {81.1} \pm {0.7} \) |
| 0.0 | ✓ | \( {94.6} \pm {0.2} \) | \( {81.6} \pm {1.2} \) | \( {72.9} \pm {0.4} \) | \( {77.0} \pm {1.6} \) | \( {81.5} \pm {0.2} \) |
| -0.1 | ✓ | \( {94.6} \pm {0.3} \) | \( {82.6} \pm {0.9} \) | \( {72.2} \pm {0.5} \) | \( {75.4} \pm {0.1} \) | \( {81.2} \pm {0.3} \) |
| -0.2 | ✓ | \( {94.1} \pm {0.1} \) | \( {81.9} \pm {0.2} \) | \( {73.3} \pm {0.7} \) | \( {75.8} \pm {0.9} \) | \( {81.2} \pm {0.1} \) |
| -0.3 | ✓ | \( {94.8} \pm {0.2} \) | \( {81.5} \pm {0.1} \) | \( {75.1} \pm {0.0} \) | \( {74.5} \pm {0.4} \) | \( {81.5} \pm {0.2} \) |
| -0.4 | ✓ | \( {93.9} \pm {0.5} \) | \( {82.4} \pm {0.2} \) | \( {74.3} \pm {1.3} \) | \( {76.1} \pm {1.1} \) | \( {81.7} \pm {0.2} \) |
| -0.5 | ✓ | \( {93.6} \pm {0.6} \) | \( {82.1} \pm {0.9} \) | \( {76.4} \pm {0.9} \) | \( {76.3} \pm {0.6} \) | \( {82.1} \pm {0.6} \) |
| -0.6 | ✓ | \( {93.8} \pm {0.6} \) | \( {82.1} \pm {0.3} \) | \( {75.7} \pm {0.2} \) | \( {73.1} \pm {3.1} \) | \( {81.2} \pm {1.0} \) |
| -0.7 | ✓ | \( {94.0} \pm {0.4} \) | \( {82.9} \pm {1.2} \) | \( {74.1} \pm {0.5} \) | \( {76.3} \pm {1.0} \) | \( {81.8} \pm {0.4} \) |
| -0.8 | ✓ | \( {94.1} \pm {0.5} \) | \( {81.8} \pm {0.3} \) | \( {75.5} \pm {0.2} \) | \( {72.7} \pm {0.3} \) | \( {81.0} \pm {0.1} \) |
| -1.0 | ✓ | \( {95.1} \pm {0.6} \) | \( {80.7} \pm {1.5} \) | \( {74.5} \pm {1.1} \) | \( {73.7} \pm {1.2} \) | \( {81.0} \pm {0.4} \) |
| -2.0 | ✓ | \( {94.3} \pm {0.3} \) | \( {82.9} \pm {0.5} \) | \( {75.1} \pm {1.0} \) | \( {75.0} \pm {1.1} \) | \( {81.8} \pm {0.5} \) |
| -3.0 | ✓ | \( {94.5} \pm {0.3} \) | \( {79.8} \pm {1.0} \) | \( {74.9} \pm {1.0} \) | \( {71.2} \pm {0.4} \) | \( {80.1} \pm {0.3} \) |
| -10.0 | ✓ | \( {90.8} \pm {3.7} \) | \( {80.2} \pm {0.3} \) | \( {72.9} \pm {0.1} \) | \( {75.0} \pm {1.8} \) | \( {79.7} \pm {0.6} \) |
| -100.0 | ✓ | \( {72.0} \pm {11} \) . | \( {76.3} \pm {2.2} \) | \( {73.0} \pm {1.1} \) | \( {69.1} \pm {2.6} \) | \( {72.6} \pm {4.0} \) |
Table 5.9: Performance comparison for different intra factors on the PACS dataset without (top) and with self-challenging (bottom) using training-domain validation and a ResNet-18 backbone. The feature and batch drop factors are kept constant with
表5.9:在PACS数据集上,不使用(上方)和使用自我挑战(下方)情况下,不同内部因素的性能比较,采用训练域验证和ResNet-18骨干网络。特征和批量丢弃因素保持恒定,分别为
In this work, we investigated if we can deploy explainability methods during the training procedure and gain, both, better performance on the domain generalization task as well as a framework that enables more explainability for the users. In particular, we develop a regularization technique based on class activation maps that visualize parts of an image responsible for certain predictions (DivCAM) as well as prototypical representations that serve as a number of class or attribute centroids which the network uses to make its predictions (ProDROP and D-TRANSFORMERS).
在本工作中,我们探讨了是否可以在训练过程中部署可解释性方法,从而在域泛化任务上获得更好的性能,同时为用户提供一个更具可解释性的框架。具体而言,我们开发了一种基于类激活图(class activation maps,DivCAM)的正则化技术,该技术可视化图像中对特定预测负责的部分,以及基于原型表示的技术,这些原型作为若干类别或属性的中心点,网络利用它们进行预测(ProDROP和D-TRANSFORMERS)。
From the results and ablations presented in this work, we have shown that especially DivCAM is a reliable method for achieving domain generalization for small to medium sized ResNet backbones while offering an architecture that allows for additional insights into how and why the network arrives at it's predictions. Depending on the backbone and dataset, ProDrop and D-TRANSFORMERS can also be powerful options for this cause. The possibilities for explainability in these methods is a property that is highly desirable in practice, especially for safety-critical scenarios such as self-driving cars, any application in the medical field e.g. cancer or tumor prediction, or any other automation robot that needs to operate in a diverse set of environments. We hope that the presented methods can find application in such scenarios and establish additional trust and confidence into the machine learning systems to work reliable.
通过本工作中展示的结果和消融实验,我们证明了DivCAM尤其是一种可靠的方法,能够实现小到中等规模ResNet骨干网络的域泛化,同时提供一种架构,允许深入了解网络如何以及为何做出预测。根据骨干网络和数据集的不同,ProDrop和D-TRANSFORMERS也可能是实现该目标的有力选择。这些方法的可解释性特性在实际应用中极为重要,尤其是在安全关键场景,如自动驾驶汽车、医疗领域(例如癌症或肿瘤预测)或任何需要在多样环境中运行的自动化机器人。我们希望所提出的方法能在此类场景中得到应用,增强对机器学习系统的信任和可靠性。
Building upon the methods presented in this work, there are also quite a number of ablations or extension points that might be interesting to investigate. First, even though the results for ProDROP looked very promising on ResNet-18, it failed to generalize well enough to ResNet-50 to properly compete with the other methods in DomAINBED. Looking into some of the details coming from this transition, might yield a better understanding of what causes this problem and a solution can probably be found. Secondly, even though the explainability methods we build upon have very deep roots in the explainability literature, it would be interesting to either jointly train a suitable decoder for the prototypes or visualize the closest latent patch across the whole dataset. Since we're training with images coming from different domains, there could be interesting visualizations possible, potentially also in a video format that shows the change throughout training. In this work, we especially focused on achieving good performance with these methods rather than the explainability itself. Thirdly, many prototypical networks upscale the feature map in size,for example to
基于本工作中提出的方法,还有许多消融或扩展点值得进一步研究。首先,尽管ProDROP在ResNet-18上的结果非常有前景,但它未能在ResNet-50上实现足够的泛化,无法在DOMAINBED中与其他方法竞争。深入分析这一转变的细节,可能有助于更好地理解问题根源并找到解决方案。其次,尽管我们所依赖的可解释性方法在相关文献中有深厚基础,但联合训练适合的解码器以解码原型,或可视化整个数据集中最接近的潜在图像块,可能会带来有趣的视觉效果。由于训练图像来自不同域,可能实现有趣的可视化,甚至以视频形式展示训练过程中的变化。本工作重点在于利用这些方法实现良好性能,而非专注于可解释性本身。第三,许多原型网络会将特征图尺寸放大,例如从
Finally, especially for D-TRANSFORMERS, our method of aggregating across multiple environments is very simple in nature and finding a more elaborate way to utilize the multiple aligned prototypes for each environment might be a great opportunity for a follow-up work. Nevertheless, we hope that our analysis might serve as a first stepping stone and groundwork for developing more domain generalization algorithms based on explainability methods.
最后,尤其对于D-TRANSFORMERS,我们在多个环境间聚合的方法非常简单,寻找更复杂的方式利用每个环境的多个对齐原型,可能是后续工作的良好机会。尽管如此,我们希望我们的分析能作为基于可解释性方法开发更多域泛化算法的第一步和基础。
[1] A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. M. Wallach. "A Reductions Approach to Fair Classification". In: International Conference on Machine Learning, ICML. 2018.
[1] A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, 和 H. M. Wallach. “公平分类的归约方法”。发表于国际机器学习大会(ICML),2018年。
[2] K. Akuzawa, Y. Iwasawa, and Y. Matsuo. "Adversarial Invariant Feature Learning with Accuracy Constraint for Domain Generalization". In: European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD. 2019.
[2] K. Akuzawa, Y. Iwasawa, 和 Y. Matsuo. “带准确性约束的对抗不变特征学习用于域泛化”。发表于欧洲机器学习与知识发现数据库会议(ECML PKDD),2019年。
[3] G. Alain and Y. Bengio. "Understanding intermediate layers using linear classifier probes". In: International Conference on Learning Representations, ICLR. 2017.
[3] G. Alain 和 Y. Bengio. “使用线性分类器探针理解中间层”。发表于:国际学习表征会议(ICLR),2017年。
[4] E. A. AlBadawy, A. Saha, and M. A. Mazurowski. "Deep learning for segmentation of brain tumors: Impact of cross-institutional training and testing". In: Medical Physics 45.3 (2018), pp. 1150-1158.
[4] E. A. AlBadawy, A. Saha 和 M. A. Mazurowski. “脑肿瘤分割的深度学习:跨机构训练和测试的影响”。发表于:医学物理学,45卷第3期(2018),第1150-1158页。
[5] I. Albuquerque, J. Monteiro, M. Darvishi, T. H. Falk, and I. Mitliagkas. Generalizing to unseen domains via distribution matching. 2019. arXiv: 1911.00804 [cs.LG].
[5] I. Albuquerque, J. Monteiro, M. Darvishi, T. H. Falk 和 I. Mitliagkas. 通过分布匹配实现对未见领域的泛化。2019年。arXiv: 1911.00804 [cs.LG]。
[6] D. Alvarez-Melis and T. S. Jaakkola. "Towards Robust Interpretability with Self-Explaining Neural Networks". In: Advances in Neural Information Processing Systems, NeurIPS. 2018.
[6] D. Alvarez-Melis 和 T. S. Jaakkola. “迈向具有自解释能力的神经网络的稳健可解释性”。发表于:神经信息处理系统进展(NeurIPS),2018年。
[7] S. Amershi, M. Chickering, S. Drucker, B. Lee, P. Simard, and J. Suh. "ModelTracker: Redesigning Performance Analysis Tools for Machine Learning". In: Conference on Human Factors in Computing Systems, CHI. 2015.
[7] S. Amershi, M. Chickering, S. Drucker, B. Lee, P. Simard 和 J. Suh. “ModelTracker:重新设计机器学习性能分析工具”。发表于:人因计算系统会议(CHI),2015年。
[8] S. Anwar and N. Barnes. "Real Image Denoising With Feature Attention". In: International Conference on Computer Vision, ICCV. 2019.
[8] S. Anwar 和 N. Barnes. “基于特征注意力的真实图像去噪”。发表于:国际计算机视觉会议(ICCV),2019年。
[9] M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz. Invariant Risk Minimization. 2019. arXiv: 1907.02893 [stat.ML].
[9] M. Arjovsky, L. Bottou, I. Gulrajani 和 D. Lopez-Paz. 不变风险最小化(Invariant Risk Minimization)。2019年。arXiv: 1907.02893 [stat.ML]。
[10] N. Asadi, A. M. Sarfi, M. Hosseinzadeh, Z. Karimpour, and M. Eftekhari. Towards Shape Biased Unsupervised Representation Learning for Domain Generalization. 2019. arXiv: 1909.08245 [cs.CV].
[10] N. Asadi, A. M. Sarfi, M. Hosseinzadeh, Z. Karimpour 和 M. Eftekhari. 面向领域泛化的形状偏置无监督表示学习。2019年。arXiv: 1909.08245 [cs.CV]。
[11] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek. "On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation". In: PLOS ONE 10.7 (2015), e0130140.
[11] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller 和 W. Samek. “基于层次相关传播的非线性分类器决策的像素级解释”。发表于:PLOS ONE,10卷第7期(2015),e0130140。
[12] W. Bae, J. Noh, and G. Kim. "Rethinking Class Activation Mapping for Weakly Supervised Object Localization". In: European Conference on Computer Vision, ECCV. 2020.
[12] W. Bae, J. Noh 和 G. Kim. “重新思考弱监督目标定位的类激活映射”。发表于:欧洲计算机视觉会议(ECCV),2020年。
[13] D. Bahdanau, K. Cho, and Y. Bengio. "Neural Machine Translation by Jointly Learning to Align and Translate". In: International Conference on Learning Representations, ICLR. 2015.
[13] D. Bahdanau, K. Cho 和 Y. Bengio. “通过联合学习对齐与翻译的神经机器翻译”。发表于:国际学习表征会议(ICLR),2015年。
[14] M. Baktashmotlagh, M. T. Harandi, B. C. Lovell, and M. Salzmann. "Unsupervised Domain Adaptation by Domain Invariant Projection". In: International Conference on Computer Vision, ICCV. 2013.
[14] M. Baktashmotlagh, M. T. Harandi, B. C. Lovell 和 M. Salzmann. “通过领域不变投影实现无监督领域适应”。发表于:国际计算机视觉会议(ICCV),2013年。
[15] Y. Balaji, S. Sankaranarayanan, and R. Chellappa. "MetaReg: Towards Domain Generalization using Meta-Regularization". In: Advances in Neural Information Processing Systems, NeurIPS. 2018.
[15] Y. Balaji, S. Sankaranarayanan 和 R. Chellappa. “MetaReg:基于元正则化的领域泛化方法”。发表于:神经信息处理系统进展(NeurIPS),2018年。
[16] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. "Clustering with Bregman Divergences". In: International Conference on Data Mining, SDM. 2004.
[16] A. Banerjee, S. Merugu, I. S. Dhillon 和 J. Ghosh. “基于Bregman散度的聚类”。发表于:国际数据挖掘会议(SDM),2004年。
[17] S. Beery, G. V. Horn, and P. Perona. "Recognition in Terra Incognita". In: European Conference on Computer Vision, ECCV. 2018.
[17] S. Beery, G. V. Horn 和 P. Perona. “在未知领域中的识别”。发表于:欧洲计算机视觉会议(ECCV),2018年。
[18] J. Bien and R. Tibshirani. "Prototype selection for interpretable classification". In: The Annals of Applied Statistics 5.4 (2011), pp. 2403-2424.
[18] J. Bien 和 R. Tibshirani. “用于可解释分类的原型选择”。发表于:应用统计年鉴,5卷第4期(2011),第2403-2424页。
[19] G. Blanchard, A. A. Deshmukh, U. Dogan, G. Lee, and C. Scott. Domain Generalization by Marginal Transfer Learning. 2017. arXiv: 1711.07910 [stat.ML].
[19] G. Blanchard, A. A. Deshmukh, U. Dogan, G. Lee, 和 C. Scott. 通过边际迁移学习实现领域泛化。2017。arXiv: 1711.07910 [stat.ML]。
[20] G. Blanchard, G. Lee, and C. Scott. "Generalizing from Several Related Classification Tasks to a New Unlabeled Sample". In: Advances in Neural Information Processing Systems, NIPS. 2011.
[20] G. Blanchard, G. Lee, 和 C. Scott. “从多个相关分类任务泛化到新的无标签样本”。载于:神经信息处理系统进展,NIPS。2011。
[21] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan. "Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2017.
[21] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, 和 D. Krishnan. “基于生成对抗网络的无监督像素级领域自适应”。载于:计算机视觉与模式识别会议,CVPR。2017。
[22] K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, and D. Erhan. "Domain Separation Networks". In: Advances in Neural Information Processing Systems, NIPS. 2016.
[22] K. Bousmalis, G. Trigeorgis, N. Silberman, D. Krishnan, 和 D. Erhan. “领域分离网络”。载于:神经信息处理系统进展,NIPS。2016。
[23] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. "Language Models are Few-Shot Learners". In: Advances in Neural Information Processing Systems, NeurIPS. 2020.
[23] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, 和 D. Amodei. “语言模型是少样本学习者”。载于:神经信息处理系统进展,NeurIPS。2020。
[24] X. Cai, J. Shang, Z. Jin, F. Liu, B. Qiang, W. Xie, and L. Zhao. "DBGE: Employee Turnover Prediction Based on Dynamic Bipartite Graph Embedding". In: IEEE Access 8 (2020), pp. 10390- 10402.
[24] X. Cai, J. Shang, Z. Jin, F. Liu, B. Qiang, W. Xie, 和 L. Zhao. “DBGE:基于动态二分图嵌入的员工流失预测”。载于:IEEE Access 8 (2020), 页码 10390-10402。
[25] T. Calders, F. Kamiran, and M. Pechenizkiy. "Building Classifiers with Independency Constraints". In: International Conference on Data Mining, ICDM. 2009.
[25] T. Calders, F. Kamiran, 和 M. Pechenizkiy. “构建具有独立性约束的分类器”。载于:国际数据挖掘会议,ICDM。2009。
[26] O. Camburu, T. Rocktäschel, T. Lukasiewicz, and P. Blunsom. "e-SNLI: Natural Language Inference with Natural Language Explanations". In: Advances in Neural Information Processing Systems, NeurIPS. 2018.
[26] O. Camburu, T. Rocktäschel, T. Lukasiewicz, 和 P. Blunsom. “e-SNLI:带有自然语言解释的自然语言推理”。载于:神经信息处理系统进展,NeurIPS。2018。
[27] O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, and P. Blunsom. "Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations". In: Annual Meeting of the Association for Computational Linguistics, ACL. 2020.
[27] O. Camburu, B. Shillingford, P. Minervini, T. Lukasiewicz, 和 P. Blunsom. “做出决定!不一致自然语言解释的对抗生成”。载于:计算语言学协会年会,ACL。2020。
[28] F. M. Carlucci, A. D'Innocente, S. Bucci, B. Caputo, and T. Tommasi. "Domain Generalization by Solving Jigsaw Puzzles". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2019.
[28] F. M. Carlucci, A. D'Innocente, S. Bucci, B. Caputo, 和 T. Tommasi. “通过拼图游戏实现领域泛化”。载于:计算机视觉与模式识别会议,CVPR。2019。
[29] F. M. Carlucci, L. Porzi, B. Caputo, E. Ricci, and S. R. Bulò. "AutoDIAL: Automatic Domain Alignment Layers". In: International Conference on Computer Vision, ICCV. 2017.
[29] F. M. Carlucci, L. Porzi, B. Caputo, E. Ricci, 和 S. R. Bulò. “AutoDIAL:自动领域对齐层”。载于:国际计算机视觉会议,ICCV。2017。
[30] R. Caruana, S. Lawrence, and C. L. Giles. "Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping". In: Advances in Neural Information Processing Systems, NIPS. 2000.
[30] R. Caruana, S. Lawrence, 和 C. L. Giles. “神经网络中的过拟合:反向传播、共轭梯度和提前停止”。载于:神经信息处理系统进展,NIPS。2000。
[31] D. C. Castro, I. Walker, and B. Glocker. "Causality matters in medical imaging". In: Nature Communications 11.1 (2020).
[31] D. C. Castro, I. Walker, 和 B. Glocker. “因果关系在医学影像中的重要性”。载于:自然通讯 11.1 (2020)。
[32] C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin, and J. Su. "This Looks Like That: Deep Learning for Interpretable Image Recognition". In: Advances in Neural Information Processing Systems, NeurIPS. 2019.
[32] C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin, 和 J. Su. “这看起来像那个:用于可解释图像识别的深度学习”。载于:神经信息处理系统进展,NeurIPS。2019。
[33] M. J. Choi, J. J. Lim, A. Torralba, and A. S. Willsky. "Exploiting hierarchical context on a large database of object categories". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2010.
[33] M. J. Choi, J. J. Lim, A. Torralba, 和 A. S. Willsky. “利用大型物体类别数据库中的层次上下文”。载于:计算机视觉与模式识别会议,CVPR。2010。
[34] G. Csurka. "A Comprehensive Survey on Domain Adaptation for Visual Applications". In: Domain Adaptation in Computer Vision Applications. 2017.
[34] G. Csurka. “视觉应用领域自适应的综合调研”。载于:《计算机视觉应用中的领域自适应》。2017年。
[35] A. D'Innocente and B. Caputo. "Domain Generalization with Domain-Specific Aggregation Modules". In: German Conference on Pattern Recognition, GCPR. 2018.
[35] A. D'Innocente 和 B. Caputo。“具有领域特定聚合模块的领域泛化”。载于:德国模式识别会议,GCPR。2018年。
[36] D. Dai and L. V. Gool. "Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime". In: International Conference on Intelligent Transportation Systems, ITSC. 2018.
[36] D. Dai 和 L. V. Gool。“暗模型适应:从白天到夜晚的语义图像分割”。载于:智能交通系统国际会议,ITSC。2018年。
[37] Z. Deng, F. Ding, C. Dwork, R. Hong, G. Parmigiani, P. Patil, and P. Sur. Representation via Representations: Domain Generalization via Adversarially Learned Invariant Representations. 2020. arXiv: 2006.11478 [cs.LG].
[37] Z. Deng, F. Ding, C. Dwork, R. Hong, G. Parmigiani, P. Patil 和 P. Sur。通过表示实现表示:基于对抗学习的不变表示的领域泛化。2020年。arXiv: 2006.11478 [cs.LG]。
[38] A. A. Deshmukh, Y. Lei, S. Sharma, U. Dogan, J. W. Cutler, and C. Scott. A Generalization Error Bound for Multi-class Domain Generalization. 2019. arXiv: 1905.10392 [stat.ML].
[38] A. A. Deshmukh, Y. Lei, S. Sharma, U. Dogan, J. W. Cutler 和 C. Scott。多类领域泛化的一般化误差界。2019年。arXiv: 1905.10392 [stat.ML]。
[39] J. Devlin, M. Chang, K. Lee, and K. Toutanova. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. 2019.
[39] J. Devlin, M. Chang, K. Lee 和 K. Toutanova。“BERT:用于语言理解的深度双向Transformer预训练”。载于:北美计算语言学协会人类语言技术会议,NAACL-HLT。2019年。
[40] T. DeVries and G. W. Taylor. Improved Regularization of Convolutional Neural Networks with Cutout. 2017. arXiv: 1708.04552 [cs.CV].
[40] T. DeVries 和 G. W. Taylor。使用Cutout改进卷积神经网络的正则化。2017年。arXiv: 1708.04552 [cs.CV]。
[41] Y. Ding, Y. Liu, H. Luan, and M. Sun. "Visualizing and Understanding Neural Machine Translation". In: Annual Meeting of the Association for Computational Linguistics, ACL. 2017.
[41] Y. Ding, Y. Liu, H. Luan 和 M. Sun。“神经机器翻译的可视化与理解”。载于:计算语言学协会年会,ACL。2017年。
[42] Z. Ding and Y. Fu. "Deep Domain Generalization With Structured Low-Rank Constraint". In: IEEE Transactions on Image Processing 27.1 (2018), pp. 304-313.
[42] Z. Ding 和 Y. Fu。“具有结构化低秩约束的深度领域泛化”。载于:IEEE图像处理汇刊 27卷1期 (2018),第304-313页。
[43] C. Doersch, A. Gupta, and A. Zisserman. "CrossTransformers: spatially-aware few-shot transfer". In: Advances in Neural Information Processing Systems, NeurIPS. 2020.
[43] C. Doersch, A. Gupta 和 A. Zisserman。“CrossTransformers:空间感知的少样本迁移”。载于:神经信息处理系统大会,NeurIPS。2020年。
[44] Y. Dong, H. Su, J. Zhu, and B. Zhang. "Improving Interpretability of Deep Neural Networks with Semantic Information". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2017.
[44] Y. Dong, H. Su, J. Zhu 和 B. Zhang。“利用语义信息提升深度神经网络的可解释性”。载于:计算机视觉与模式识别会议,CVPR。2017年。
[45] M. Donini, L. Oneto, S. Ben-David, J. Shawe-Taylor, and M. Pontil. "Empirical Risk Minimization Under Fairness Constraints". In: Advances in Neural Information Processing Systems, NeurIPS. 2018.
[45] M. Donini, L. Oneto, S. Ben-David, J. Shawe-Taylor 和 M. Pontil。“在公平性约束下的经验风险最小化”。载于:神经信息处理系统大会,NeurIPS。2018年。
[46] Q. Dou, D. C. de Castro, K. Kamnitsas, and B. Glocker. "Domain Generalization via Model-Agnostic Learning of Semantic Features". In: Advances in Neural Information Processing Systems, NeurIPS. 2019.
[46] Q. Dou, D. C. de Castro, K. Kamnitsas 和 B. Glocker。“通过模型无关的语义特征学习实现领域泛化”。载于:神经信息处理系统大会,NeurIPS。2019年。
[47] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. S. Zemel. "Fairness through awareness". In: Innovations in Theoretical Computer Science, ITCS. 2012.
[47] C. Dwork, M. Hardt, T. Pitassi, O. Reingold 和 R. S. Zemel。“通过意识实现公平”。载于:理论计算机科学创新会议,ITCS。2012年。
[48] C. Dwork, N. Immorlica, A. T. Kalai, and M. D. M. Leiserson. "Decoupled Classifiers for Group-Fair and Efficient Machine Learning". In: Conference on Fairness, Accountability and Transparency, FAT. 2018.
[48] C. Dwork, N. Immorlica, A. T. Kalai 和 M. D. M. Leiserson。“用于群体公平与高效机器学习的解耦分类器”。载于:公平性、问责制与透明度会议,FAT。2018年。
[49] M. Eitz, J. Hays, and M. Alexa. "How do humans sketch objects?" In: Transactions on Graphics, TOG 31.4 (2012), 44:1-44:10.
[49] M. Eitz, J. Hays 和 M. Alexa。“人类如何绘制物体草图?”载于:图形学汇刊,TOG 31卷4期 (2012),44:1-44:10。
[50] E. R. Elenberg, A. G. Dimakis, M. Feldman, and A. Karbasi. "Streaming Weak Submodularity: Interpreting Neural Networks on the Fly". In: Advances in Neural Information Processing Systems, NIPS. 2017.
[50] E. R. Elenberg, A. G. Dimakis, M. Feldman, 和 A. Karbasi. “流式弱子模性:即时解释神经网络”。载于:神经信息处理系统进展,NIPS。2017年。
[51] M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, and A. Zisserman. "The Pascal Visual Object Classes (VOC) Challenge". In: International Journal of Computer Vision, IJCV 88.2 (2010), pp. 303-338.
[51] M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, 和 A. Zisserman. “Pascal视觉对象类别(VOC)挑战”。载于:国际计算机视觉杂志,IJCV 88.2 (2010), 页303-338。
[52] C. Fang, Y. Xu, and D. N. Rockmore. "Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias". In: International Conference on Computer Vision, ICCV. 2013.
[52] C. Fang, Y. Xu, 和 D. N. Rockmore. “无偏度量学习:关于利用多个数据集和网络图像以减轻偏差”。载于:国际计算机视觉会议,ICCV。2013年。
[53] M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian. "Certifying and Removing Disparate Impact". In: International Conference on Knowledge Discovery and Data Mining, SIGKDD. 2015.
[53] M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, 和 S. Venkatasubramanian. “认证与消除差异影响”。载于:知识发现与数据挖掘国际会议,SIGKDD。2015年。
[54] B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars. "Unsupervised Visual Domain Adaptation Using Subspace Alignment". In: International Conference on Computer Vision, ICCV. 2013.
[54] B. Fernando, A. Habrard, M. Sebban, 和 T. Tuytelaars. “基于子空间对齐的无监督视觉域适应”。载于:国际计算机视觉会议,ICCV。2013年。
[55] S. Fidler, S. J. Dickinson, and R. Urtasun. "3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model". In: Advances in Neural Information Processing Systems, NIPS. 2012.
[55] S. Fidler, S. J. Dickinson, 和 R. Urtasun. “基于可变形三维立方体模型的三维物体检测与视角估计”。载于:神经信息处理系统进展,NIPS。2012年。
[56] C. Finn, P. Abbeel, and S. Levine. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks". In: International Conference on Machine Learning, ICML. 2017.
[56] C. Finn, P. Abbeel, 和 S. Levine. “模型无关元学习(Model-Agnostic Meta-Learning)用于深度网络的快速适应”。载于:国际机器学习会议,ICML。2017年。
[57] R. C. Fong and A. Vedaldi. "Interpretable Explanations of Black Boxes by Meaningful Perturbation". In: International Conference on Computer Vision, ICCV. 2017.
[57] R. C. Fong 和 A. Vedaldi. “通过有意义的扰动对黑盒模型进行可解释性说明”。载于:国际计算机视觉会议,ICCV。2017年。
[58] N. Frosst and G. E. Hinton. "Distilling a Neural Network Into a Soft Decision Tree". In: International Workshop on Comprehensibility and Explanation in AI and ML, CEX. 2017.
[58] N. Frosst 和 G. E. Hinton. “将神经网络蒸馏为软决策树”。载于:人工智能与机器学习可解释性研讨会,CEX。2017年。
[59] F. B. Fuchs, O. Groth, A. R. Kosiorek, A. Bewley, M. Wulfmeier, A. Vedaldi, and I. Posner. "Scrutinizing and De-Biasing Intuitive Physics with Neural Stethoscopes". In: British Machine Vision Conference, BMVC. 2019.
[59] F. B. Fuchs, O. Groth, A. R. Kosiorek, A. Bewley, M. Wulfmeier, A. Vedaldi, 和 I. Posner. “利用神经听诊器审视并去偏直觉物理学”。载于:英国机器视觉会议,BMVC。2019年。
[60] Y. Ganin and V. S. Lempitsky. "Unsupervised Domain Adaptation by Backpropagation". In: International Conference on Machine Learning, ICML. 2015.
[60] Y. Ganin 和 V. S. Lempitsky. “通过反向传播实现无监督域适应”。载于:国际机器学习会议,ICML。2015年。
[61] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. S. Lempitsky. "Domain-Adversarial Training of Neural Networks". In: Journal of Machine Learning Research, JMLR 17 (2016), 59:1-59:35.
[61] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, 和 V. S. Lempitsky. “神经网络的域对抗训练”。载于:机器学习研究杂志,JMLR 17 (2016), 59:1-59:35。
[62] R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel. "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness". In: International Conference on Learning Representations, ICLR. 2019.
[62] R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, 和 W. Brendel. “ImageNet训练的卷积神经网络偏向纹理;增强形状偏向提升准确性和鲁棒性”。载于:学习表征国际会议,ICLR。2019年。
[63] M. Gharib, P. Lollini, M. Botta, E. G. Amparore, S. Donatelli, and A. Bondavalli. "On the Safety of Automotive Systems Incorporating Machine Learning Based Components: A Position Paper". In: International Conference on Dependable Systems and Networks, DSN. 2018.
[63] M. Gharib, P. Lollini, M. Botta, E. G. Amparore, S. Donatelli, 和 A. Bondavalli. “关于包含基于机器学习组件的汽车系统安全性的立场论文”。载于:可靠系统与网络国际会议,DSN。2018年。
[64] G. Ghiasi, T. Lin, and Q. V. Le. "DropBlock: A regularization method for convolutional networks". In: Advances in Neural Information Processing Systems, NeurIPS. 2018.
[64] G. Ghiasi, T. Lin, 和 Q. V. Le. “DropBlock:卷积网络的正则化方法”。载于:神经信息处理系统进展,NeurIPS。2018年。
[65] M. Ghifary, D. Balduzzi, W. B. Kleijn, and M. Zhang. "Scatter Component Analysis: A Unified Framework for Domain Adaptation and Domain Generalization". In: Transactions on Pattern Analysis and Machine Intelligence, PAMI 39.7 (2017), pp. 1414-1430.
[65] M. Ghifary, D. Balduzzi, W. B. Kleijn, 和 M. Zhang. “散射成分分析:域适应与域泛化的统一框架”。载于:模式分析与机器智能汇刊,PAMI 39.7 (2017), 页1414-1430。
[66] M. Ghifary, W. B. Kleijn, M. Zhang, and D. Balduzzi. "Domain Generalization for Object Recognition with Multi-task Autoencoders". In: International Conference on Computer Vision, ICCV. 2015.
[66] M. Ghifary, W. B. Kleijn, M. Zhang, 和 D. Balduzzi. “基于多任务自编码器的目标识别领域泛化”。发表于:国际计算机视觉大会(ICCV),2015年。
[67] M. Ghifary, W. B. Kleijn, M. Zhang, D. Balduzzi, and W. Li. "Deep Reconstruction-Classification Networks for Unsupervised Domain Adaptation". In: European Conference on Computer Vision, ECCV. 2016.
[67] M. Ghifary, W. B. Kleijn, M. Zhang, D. Balduzzi, 和 W. Li. “用于无监督领域适应的深度重建-分类网络”。发表于:欧洲计算机视觉大会(ECCV),2016年。
[68] B. Gong, K. Grauman, and F. Sha. "Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation". In: International Conference on Machine Learning, ICML. 2013.
[68] B. Gong, K. Grauman, 和 F. Sha. “通过地标连接点:判别式学习无监督领域适应的领域不变特征”。发表于:国际机器学习大会(ICML),2013年。
[69] B. Gong, Y. Shi, F. Sha, and K. Grauman. "Geodesic flow kernel for unsupervised domain adaptation". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2012.
[69] B. Gong, Y. Shi, F. Sha, 和 K. Grauman. “用于无监督领域适应的测地流核(Geodesic flow kernel)”。发表于:计算机视觉与模式识别会议(CVPR),2012年。
[70] P. Gordaliza, E. del Barrio, F. Gamboa, and J. Loubes. "Obtaining Fairness using Optimal Transport Theory". In: International Conference on Machine Learning, ICML. 2019.
[70] P. Gordaliza, E. del Barrio, F. Gamboa, 和 J. Loubes. “利用最优传输理论实现公平性”。发表于:国际机器学习大会(ICML),2019年。
[71] A. Graves, G. Wayne, and I. Danihelka. Neural Turing Machines. 2014. arXiv: 1410.5401 [cs.NE].
[71] A. Graves, G. Wayne, 和 I. Danihelka. 神经图灵机(Neural Turing Machines)。2014年。arXiv: 1410.5401 [cs.NE]。
[72] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. J. Smola. "A Kernel Two-Sample Test". In: Journal of Machine Learning Research, JMLR 13 (2012), pp. 723-773.
[72] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, 和 A. J. Smola. “核两样本检验”。发表于:机器学习研究杂志(JMLR)13卷(2012年),第723-773页。
[73] I. Gulrajani and D. Lopez-Paz. "In Search of Lost Domain Generalization". In: International Conference on Learning Representations, ICLR. 2021.
[73] I. Gulrajani 和 D. Lopez-Paz. “寻找失落的领域泛化”。发表于:国际学习表征会议(ICLR),2021年。
[74] Z. Guo, T. He, Z. Qin, Z. Xie, and J. Liu. "A Content-Based Recommendation Framework for Judicial Cases". In: International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE. 2019.
[74] Z. Guo, T. He, Z. Qin, Z. Xie, 和 J. Liu. “基于内容的司法案例推荐框架”。发表于:先驱计算机科学家、工程师与教育者国际会议(ICPCSEE),2019年。
[75] M. Hardt, E. Price, and N. Srebro. "Equality of Opportunity in Supervised Learning". In: Advances in Neural Information Processing Systems, NIPS. 2016.
[75] M. Hardt, E. Price, 和 N. Srebro. “监督学习中的机会平等”。发表于:神经信息处理系统大会(NIPS),2016年。
[76] M. Harradon, J. Druce, and B. Ruttenberg. Causal Learning and Explanation of Deep Neural Networks via Autoencoded Activations. 2018. arXiv: 1802.00541 [cs.AI].
[76] M. Harradon, J. Druce, 和 B. Ruttenberg. 通过自编码激活实现深度神经网络的因果学习与解释。2018年。arXiv: 1802.00541 [cs.AI]。
[77] K. He, X. Zhang, S. Ren, and J. Sun. "Deep Residual Learning for Image Recognition". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2016.
[77] K. He, X. Zhang, S. Ren, 和 J. Sun. “用于图像识别的深度残差学习”。发表于:计算机视觉与模式识别会议(CVPR),2016年。
[78] H. Heidari, C. Ferrari, K. P. Gummadi, and A. Krause. "Fairness Behind a Veil of Ignorance: A Welfare Analysis for Automated Decision Making". In: Advances in Neural Information Processing Systems, NeurIPS. 2018.
[78] H. Heidari, C. Ferrari, K. P. Gummadi, 和 A. Krause. “无知之幕下的公平性:自动决策的福利分析”。发表于:神经信息处理系统大会(NeurIPS),2018年。
[79] L. A. Hendricks, Z. Akata, M. Rohrbach, J. Donahue, B. Schiele, and T. Darrell. "Generating Visual Explanations". In: European Conference on Computer Vision, ECCV. 2016.
[79] L. A. Hendricks, Z. Akata, M. Rohrbach, J. Donahue, B. Schiele, 和 T. Darrell. “生成视觉解释”。发表于:欧洲计算机视觉大会(ECCV),2016年。
[80] D. Hendrycks and T. G. Dietterich. "Benchmarking Neural Network Robustness to Common Corruptions and Perturbations". In: International Conference on Learning Representations, ICLR. 2019.
[80] D. Hendrycks 和 T. G. Dietterich. “神经网络对常见损坏和扰动的鲁棒性基准测试”。发表于:国际学习表征会议(ICLR),2019年。
[81] M. Hind, D. Wei, M. Campbell, N. C. F. Codella, A. Dhurandhar, A. Mojsilovic, K. N. Rama-murthy, and K. R. Varshney. "TED: Teaching AI to Explain its Decisions". In: Conference on AI, Ethics, and Society, AIES. 2019.
[81] M. Hind, D. Wei, M. Campbell, N. C. F. Codella, A. Dhurandhar, A. Mojsilovic, K. N. Ramamurthy, 和 K. R. Varshney. “TED:教人工智能解释其决策”。发表于:人工智能、伦理与社会会议(AIES),2019年。
[82] B. Hou and Z. Zhou. "Learning With Interpretable Structure From Gated RNN". In: Transactions on Neural Networks and Learning Systems 31.7 (2020), pp. 2267-2279.
[82] B. Hou 和 Z. Zhou. “从门控循环神经网络(Gated RNN)中学习可解释结构”。发表于:《神经网络与学习系统汇刊》(Transactions on Neural Networks and Learning Systems)31卷7期(2020年),第2267-2279页。
[83] S. Hu, K. Zhang, Z. Chen, and L. Chan. "Domain Generalization via Multidomain Discriminant Analysis". In: Conference on Uncertainty in Artificial Intelligence, UAI. 2019.
[83] S. Hu, K. Zhang, Z. Chen 和 L. Chan. “通过多域判别分析实现领域泛化”。发表于:人工智能不确定性会议(Conference on Uncertainty in Artificial Intelligence, UAI),2019年。
[84] W. Hu, G. Niu, I. Sato, and M. Sugiyama. "Does Distributionally Robust Supervised Learning Give Robust Classifiers?" In: International Conference on Machine Learning, ICML. 2018.
[84] W. Hu, G. Niu, I. Sato 和 M. Sugiyama. “分布鲁棒监督学习是否能产生鲁棒分类器?”发表于:国际机器学习大会(International Conference on Machine Learning, ICML),2018年。
[85] J. Huang, A. J. Smola, A. Gretton, K. M. Borgwardt, and B. Schölkopf. "Correcting Sample Selection Bias by Unlabeled Data". In: Advances in Neural Information Processing Systems, NIPS. 2006.
[85] J. Huang, A. J. Smola, A. Gretton, K. M. Borgwardt 和 B. Schölkopf. “通过无标签数据校正样本选择偏差”。发表于:神经信息处理系统进展会议(Advances in Neural Information Processing Systems, NIPS),2006年。
[86] Z. Huang, H. Wang, E. P. Xing, and D. Huang. "Self-Challenging Improves Cross-Domain Generalization". In: European Conference on Computer Vision, ECCV. 2020.
[86] Z. Huang, H. Wang, E. P. Xing 和 D. Huang. “自我挑战提升跨域泛化能力”。发表于:欧洲计算机视觉大会(European Conference on Computer Vision, ECCV),2020年。
[87] D. A. Hudson and C. D. Manning. "Compositional Attention Networks for Machine Reasoning". In: International Conference on Learning Representations, ICLR. 2018.
[87] D. A. Hudson 和 C. D. Manning. “用于机器推理的组合注意力网络”。发表于:国际表征学习会议(International Conference on Learning Representations, ICLR),2018年。
[88] M. Ilse, J. M. Tomczak, C. Louizos, and M. Welling. "DIVA: Domain Invariant Variational Autoencoders". In: International Conference on Learning Representations, ICLR. 2019.
[88] M. Ilse, J. M. Tomczak, C. Louizos 和 M. Welling. “DIVA:领域不变变分自编码器”。发表于:国际表征学习会议(International Conference on Learning Representations, ICLR),2019年。
[89] S. Ioffe and C. Szegedy. "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". In: International Conference on Machine Learning, ICML. 2015.
[89] S. Ioffe 和 C. Szegedy. “批量归一化:通过减少内部协变量偏移加速深度网络训练”。发表于:国际机器学习大会(International Conference on Machine Learning, ICML),2015年。
[90] R. Iyer, Y. Li, H. Li, M. Lewis, R. Sundar, and K. P. Sycara. "Transparency and Explanation in Deep Reinforcement Learning Neural Networks". In: Conference on AI, Ethics, and Society, AIES. 2018.
[90] R. Iyer, Y. Li, H. Li, M. Lewis, R. Sundar, 和 K. P. Sycara. “深度强化学习神经网络中的透明性与解释性”。发表于:人工智能、伦理与社会会议(AIES),2018年。
[91] S. Jain and B. C. Wallace. "Attention is not Explanation". In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. 2019.
[91] S. Jain 和 B. C. Wallace. “注意力机制不是解释”。发表于:北美计算语言学协会人类语言技术分会会议(NAACL-HLT),2019年。
[92] Y. Jia, J. Zhang, S. Shan, and X. Chen. "Single-Side Domain Generalization for Face Anti-Spoofing". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2020.
[92] Y. Jia, J. Zhang, S. Shan, 和 X. Chen. “面部反欺骗的单侧域泛化”。发表于:计算机视觉与模式识别会议(CVPR),2020年。
[93] H. Jiang, B. Kim, M. Y. Guan, and M. R. Gupta. "To Trust Or Not To Trust A Classifier". In: Advances in Neural Information Processing Systems, NeurIPS. 2018.
[93] H. Jiang, B. Kim, M. Y. Guan, 和 M. R. Gupta. “信任还是不信任分类器”。发表于:神经信息处理系统大会(NeurIPS),2018年。
[94] X. Jin, C. Lan, W. Zeng, and Z. Chen. Feature Alignment and Restoration for Domain Generalization and Adaptation. 2020. arXiv: 2006.12009 [cs.CV].
[94] X. Jin, C. Lan, W. Zeng, 和 Z. Chen. 特征对齐与恢复用于域泛化与适应。2020年。arXiv: 2006.12009 [cs.CV]。
[95] D. Kang, D. Raghavan, P. Bailis, and M. Zaharia. "Model Assertions for Monitoring and Improving ML Models". In: Conference on Machine Learning and Systems, MLSys. 2020.
[95] D. Kang, D. Raghavan, P. Bailis, 和 M. Zaharia. “用于监控和改进机器学习模型的模型断言”。发表于:机器学习与系统会议(MLSys),2020年。
[96] A. Khosla, T. Zhou, T. Malisiewicz, A. A. Efros, and A. Torralba. "Undoing the Damage of Dataset Bias". In: European Conference on Computer Vision, ECCV. 2012.
[96] A. Khosla, T. Zhou, T. Malisiewicz, A. A. Efros, 和 A. Torralba. “消除数据集偏差的影响”。发表于:欧洲计算机视觉会议(ECCV),2012年。
[97] B. Kim, C. Rudin, and J. A. Shah. "The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification". In: Advances in Neural Information Processing Systems, NIPS. 2014.
[97] B. Kim, C. Rudin, 和 J. A. Shah. “贝叶斯案例模型:一种基于案例推理和原型分类的生成方法”。发表于:神经信息处理系统大会(NIPS),2014年。
[98] D. P. Kingma and J. L. Ba. "Adam: A Method for Stochastic Optimization". In: International Conference on Learning Representations, ICLR. 2015.
[98] D. P. Kingma 和 J. L. Ba. “Adam:一种随机优化方法”。发表于:国际学习表征会议(ICLR),2015年。
[99] D. P. Kingma and M. Welling. "Auto-Encoding Variational Bayes". In: International Conference on Learning Representations, ICLR. 2014.
[99] D. P. Kingma 和 M. Welling. “自动编码变分贝叶斯”。发表于:国际学习表征会议(ICLR),2014年。
[100] A. Krizhevsky, I. Sutskever, and G. E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks". In: Advances in Neural Information Processing Systems, NIPS. 2012.
[100] A. Krizhevsky, I. Sutskever, 和 G. E. Hinton. “基于深度卷积神经网络的ImageNet分类”。发表于:神经信息处理系统大会(NIPS),2012年。
[101] D. Krueger, E. Caballero, J.-H. Jacobsen, A. Zhang, J. Binas, R. L. Priol, and A. Courville. Out-of-Distribution Generalization via Risk Extrapolation (REx). 2020. arXiv: 2003.00688 [cs.LG].
[101] D. Krueger, E. Caballero, J.-H. Jacobsen, A. Zhang, J. Binas, R. L. Priol, 和 A. Courville. 通过风险外推(REx)实现分布外泛化。2020年。arXiv: 2003.00688 [cs.LG]。
[102] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations". In: International Conference on Learning Representations, ICLR. 2019.
[102] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, 和 R. Soricut. “ALBERT:一种轻量级BERT用于语言表示的自监督学习”。发表于:国际学习表征会议(ICLR),2019年。
[103] S. Lapuschkin, A. Binder, G. Montavon, K. Müller, and W. Samek. "Analyzing Classifiers: Fisher Vectors and Deep Neural Networks". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2016.
[103] S. Lapuschkin, A. Binder, G. Montavon, K. Müller, 和 W. Samek. “分类器分析:费舍尔向量与深度神经网络”。发表于:计算机视觉与模式识别会议(CVPR),2016年。
[104] G. Larsson, M. Maire, and G. Shakhnarovich. "FractalNet: Ultra-Deep Neural Networks without Residuals". In: International Conference on Learning Representations, ICLR. 2017.
[104] G. Larsson, M. Maire, 和 G. Shakhnarovich. “FractalNet:无残差的超深神经网络”。发表于:国际学习表征会议(ICLR),2017年。
[105] Y. LeCun and C. Cortes. MNIST handwritten digit database. 2010.
[105] Y. LeCun 和 C. Cortes. MNIST手写数字数据库。2010年。
[106] T. Lei, R. Barzilay, and T. S. Jaakkola. "Rationalizing Neural Predictions". In: Conference on Empirical Methods in Natural Language Processing, EMNLP. 2016.
[106] T. Lei, R. Barzilay, 和 T. S. Jaakkola. “合理化神经预测”。载于:自然语言处理实证方法会议(EMNLP),2016年。
[107] D. Li, Y. Yang, Y. Song, and T. M. Hospedales. "Deeper, Broader and Artier Domain Generalization". In: International Conference on Computer Vision, ICCV. 2017.
[107] D. Li, Y. Yang, Y. Song, 和 T. M. Hospedales. “更深、更广、更艺术的领域泛化”。载于:国际计算机视觉会议(ICCV),2017年。
[108] D. Li, Y. Yang, Y. Song, and T. M. Hospedales. "Learning to Generalize: Meta-Learning for Domain Generalization". In: AAAI Conference on Artificial Intelligence, AAAI. 2018.
[108] D. Li, Y. Yang, Y. Song, 和 T. M. Hospedales. “学习泛化:领域泛化的元学习”。载于:美国人工智能协会会议(AAAI),2018年。
[109] D. Li, Y. Yang, Y.-Z. Song, and T. Hospedales. "Sequential Learning for Domain Generalization". In: European Conference on Computer Vision, ECCV. 2020.
[109] D. Li, Y. Yang, Y.-Z. Song, 和 T. Hospedales. “领域泛化的序列学习”。载于:欧洲计算机视觉会议(ECCV),2020年。
[110] D. Li, J. Zhang, Y. Yang, C. Liu, Y. Song, and T. M. Hospedales. "Episodic Training for Domain Generalization". In: International Conference on Computer Vision, ICCV. 2019.
[110] D. Li, J. Zhang, Y. Yang, C. Liu, Y. Song, 和 T. M. Hospedales. “领域泛化的情景训练”。载于:国际计算机视觉会议(ICCV),2019年。
[111] F. Li, R. Fergus, and P. Perona. "Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories". In: Computer Vision and Image Understanding 106.1 (2007), pp. 59-70.
[111] F. Li, R. Fergus, 和 P. Perona. “从少量训练样本学习生成视觉模型:一种增量贝叶斯方法,在101个物体类别上测试”。载于:《计算机视觉与图像理解》106.1 (2007), 页59-70。
[112] H. Li, S. J. Pan, S. Wang, and A. C. Kot. "Domain Generalization With Adversarial Feature Learning". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2018.
[112] H. Li, S. J. Pan, S. Wang, 和 A. C. Kot. “通过对抗特征学习实现领域泛化”。载于:计算机视觉与模式识别会议(CVPR),2018年。
[113] J. Li, W. Monroe, and D. Jurafsky. Understanding Neural Networks through Representation Erasure. 2016. arXiv: 1612.08220 [cs.CL].
[113] J. Li, W. Monroe, 和 D. Jurafsky. 通过表示擦除理解神经网络。2016年。arXiv: 1612.08220 [cs.CL]。
[114] O. Li, H. Liu, C. Chen, and C. Rudin. "Deep Learning for Case-Based Reasoning Through Prototypes: A Neural Network That Explains Its Predictions". In: AAAI Conference on Artificial Intelligence, AAAI. 2018.
[114] O. Li, H. Liu, C. Chen, 和 C. Rudin. “通过原型进行基于案例推理的深度学习:一种能解释其预测的神经网络”。载于:美国人工智能协会会议(AAAI),2018年。
[115] Y. Li, M. Gong, X. Tian, T. Liu, and D. Tao. "Domain Generalization via Conditional Invariant Representations". In: AAAI Conference on Artificial Intelligence, AAAI. 2018.
[115] Y. Li, M. Gong, X. Tian, T. Liu, 和 D. Tao. “通过条件不变表示实现领域泛化”。载于:美国人工智能协会会议(AAAI),2018年。
[116] Y. Li, X. Tian, M. Gong, Y. Liu, T. Liu, K. Zhang, and D. Tao. "Deep Domain Generalization via Conditional Invariant Adversarial Networks". In: European Conference on Computer Vision,
[116] Y. Li, X. Tian, M. Gong, Y. Liu, T. Liu, K. Zhang, 和 D. Tao. “通过条件不变对抗网络实现深度领域泛化”。载于:欧洲计算机视觉会议(ECCV),
[117] Y. Li, Y. Yang, W. Zhou, and T. M. Hospedales. "Feature-Critic Networks for Heterogeneous Domain Generalization". In: International Conference on Machine Learning, ICML. 2019.
[117] Y. Li, Y. Yang, W. Zhou, 和 T. M. Hospedales. “异构领域泛化的特征批评网络”。载于:国际机器学习会议(ICML),2019年。
[118] M. Lin, Q. Chen, and S. Yan. "Network In Network". In: International Conference on Learning Representations, ICLR. 2014.
[118] M. Lin, Q. Chen, 和 S. Yan. “网络中的网络”。载于:国际学习表征会议(ICLR),2014年。
[119] M. Long, G. Ding, J. Wang, J. Sun, Y. Guo, and P. S. Yu. "Transfer Sparse Coding for Robust Image Representation". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2013.
[119] M. Long, G. Ding, J. Wang, J. Sun, Y. Guo, 和 P. S. Yu. “用于鲁棒图像表示的迁移稀疏编码”。载于:计算机视觉与模式识别会议(CVPR),2013年。
[120] C. Louizos, K. Swersky, Y. Li, M. Welling, and R. S. Zemel. "The Variational Fair Autoen-coder". In: International Conference on Learning Representations, ICLR. 2016.
[120] C. Louizos, K. Swersky, Y. Li, M. Welling, 和 R. S. Zemel. “变分公平自编码器”。载于:国际学习表征会议(ICLR),2016年。
[121] T. Luong, H. Pham, and C. D. Manning. "Effective Approaches to Attention-based Neural Machine Translation". In: Conference on Empirical Methods in Natural Language Processing, EMNLP. 2015.
[121] T. Luong, H. Pham, 和 C. D. Manning. “基于注意力的神经机器翻译的有效方法”。载于:自然语言处理实证方法会议(EMNLP),2015年。
[122] D. Mahajan, S. Tople, and A. Sharma. Domain Generalization using Causal Matching. 2020. arXiv: 2006.07500 [cs.LG].
[122] D. Mahajan, S. Tople, 和 A. Sharma. 使用因果匹配进行领域泛化。2020。arXiv: 2006.07500 [cs.LG]。
[123] M. Mancini, Z. Akata, E. Ricci, and B. Caputo. "Towards Recognizing Unseen Categories in Unseen Domains". In: European Conference on Computer Vision, ECCV. 2020.
[123] M. Mancini, Z. Akata, E. Ricci, 和 B. Caputo. “迈向识别未见类别于未见领域”。载于:欧洲计算机视觉会议(ECCV),2020。
[124] M. Mancini, S. R. Bulò, B. Caputo, and E. Ricci. "Best Sources Forward: Domain Generalization through Source-Specific Nets". In: International Conference on Image Processing, ICIP. 2018.
[124] M. Mancini, S. R. Bulò, B. Caputo, 和 E. Ricci. “最佳源前进:通过源特定网络实现领域泛化”。载于:国际图像处理会议(ICIP),2018。
[125] M. Mancini, S. R. Bulò, B. Caputo, and E. Ricci. "Robust Place Categorization With Deep Domain Generalization". In: IEEE Robotics and Automation Letters 3.3 (2018), pp. 2093-2100.
[125] M. Mancini, S. R. Bulò, B. Caputo, 和 E. Ricci. “通过深度领域泛化实现鲁棒的场所分类”。载于:IEEE机器人与自动化快报,3卷3期(2018),第2093-2100页。
[126] M. Mancini, L. Porzi, S. R. Bulò, B. Caputo, and E. Ricci. "Boosting Domain Adaptation by Discovering Latent Domains". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2018.
[126] M. Mancini, L. Porzi, S. R. Bulò, B. Caputo, 和 E. Ricci. “通过发现潜在领域提升领域适应”。载于:计算机视觉与模式识别会议(CVPR),2018。
[127] T. Matsuura and T. Harada. "Domain Generalization Using a Mixture of Multiple Latent Domains". In: AAAI Conference on Artificial Intelligence, AAAI. 2020.
[127] T. Matsuura 和 T. Harada. “使用多潜在领域混合的领域泛化”。载于:美国人工智能协会会议(AAAI),2020。
[128] C. Molnar. Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/.2019.
[128] C. Molnar. 可解释的机器学习。使黑箱模型可解释的指南。https://christophm.github.io/interpretable-ml-book/。2019。
[129] G. Montavon, S. Lapuschkin, A. Binder, W. Samek, and K. Müller. "Explaining nonlinear classification decisions with deep Taylor decomposition". In: Pattern Recognition 65 (2017), pp. 211-222.
[129] G. Montavon, S. Lapuschkin, A. Binder, W. Samek, 和 K. Müller. “利用深度泰勒分解解释非线性分类决策”。载于:模式识别,65卷(2017),第211-222页。
[130] P. Morerio, J. Cavazza, R. Volpi, R. Vidal, and V. Murino. "Curriculum Dropout". In: International Conference on Computer Vision, ICCV. 2017.
[130] P. Morerio, J. Cavazza, R. Volpi, R. Vidal, 和 V. Murino. “课程式丢弃(Curriculum Dropout)”。载于:国际计算机视觉会议(ICCV),2017。
[131] S. Motiian, M. Piccirilli, D. A. Adjeroh, and G. Doretto. "Unified Deep Supervised Domain Adaptation and Generalization". In: International Conference on Computer Vision, ICCV. 2017.
[131] S. Motiian, M. Piccirilli, D. A. Adjeroh, 和 G. Doretto. “统一的深度监督领域适应与泛化”。载于:国际计算机视觉会议(ICCV),2017。
[132] K. Muandet, D. Balduzzi, and B. Schölkopf. "Domain Generalization via Invariant Feature Representation". In: International Conference on Machine Learning, ICML. 2013.
[132] K. Muandet, D. Balduzzi, 和 B. Schölkopf. “通过不变特征表示实现领域泛化”。载于:国际机器学习会议(ICML),2013。
[133] K. Muandet, K. Fukumizu, B. K. Sriperumbudur, and B. Schölkopf. "Kernel Mean Embedding of Distributions: A Review and Beyond". In: Foundations and Trends in Machine Learning 10.1-2 (2017), pp. 1-141.
[133] K. Muandet, K. Fukumizu, B. K. Sriperumbudur, 和 B. Schölkopf. “分布的核均值嵌入:综述及拓展”。载于:机器学习基础与趋势,10卷1-2期(2017),第1-141页。
[134] W. J. Murdoch and A. Szlam. "Automatic Rule Extraction from Long Short Term Memory Networks". In: International Conference on Learning Representations, ICLR. 2017.
[134] W. J. Murdoch 和 A. Szlam. “从长短期记忆网络自动提取规则”。载于:国际学习表征会议(ICLR),2017。
[135] H. Nam, H. Lee, J. Park, W. Yoon, and D. Yoo. Reducing Domain Gap via Style-Agnostic Networks. 2019. arXiv: 1910.11645 [cs.CV].
[135] H. Nam, H. Lee, J. Park, W. Yoon, 和 D. Yoo. 通过风格无关网络减少领域差异。2019。arXiv: 1910.11645 [cs.CV]。
[136] S. J. Nowlan and G. E. Hinton. "Simplifying Neural Networks by Soft Weight-Sharing". In: Neural Computation 4.4 (1992), pp. 473-493.
[136] S. J. Nowlan 和 G. E. Hinton. “通过软权重共享简化神经网络”。载于:神经计算,4卷4期(1992),第473-493页。
[137] O. Nuriel, S. Benaim, and L. Wolf. Permuted AdaIN: Enhancing the Representation of Local Cues in Image Classifiers. 2020. arXiv: 2010.05785 [cs.CV].
[137] O. Nuriel, S. Benaim, 和 L. Wolf. Permuted AdaIN:增强图像分类器中局部线索的表示。2020。arXiv: 2010.05785 [cs.CV]。
[138] D. H. Park, L. A. Hendricks, Z. Akata, A. Rohrbach, B. Schiele, T. Darrell, and M. Rohrbach. "Multimodal Explanations: Justifying Decisions and Pointing to the Evidence". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2018.
[138] D. H. Park, L. A. Hendricks, Z. Akata, A. Rohrbach, B. Schiele, T. Darrell, 和 M. Rohrbach. “多模态解释:为决策提供理由并指向证据”。发表于:计算机视觉与模式识别会议,CVPR。2018年。
[139] S. Park and N. Kwak. "Analysis on the Dropout Effect in Convolutional Neural Networks". In: Asian Conference on Computer Vision, ACCV. 2016.
[139] S. Park 和 N. Kwak. “卷积神经网络中Dropout效应的分析”。发表于:亚洲计算机视觉会议,ACCV。2016年。
[140] F. Pasa, V. Golkov, F. Pfeiffer, D. Cremers, and D. Pfeiffer. "Efficient Deep Network Architectures for Fast Chest X-Ray Tuberculosis Screening and Visualization". In: Scientific Reports 9.1 (2019).
[140] F. Pasa, V. Golkov, F. Pfeiffer, D. Cremers, 和 D. Pfeiffer. “用于快速胸部X光结核病筛查与可视化的高效深度网络架构”。发表于:科学报告(Scientific Reports)9卷1期(2019年)。
[141] X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang. "Moment Matching for Multi-Source Domain Adaptation". In: International Conference on Computer Vision, ICCV. 2019.
[141] X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, 和 B. Wang. “多源域适应的矩匹配方法”。发表于:国际计算机视觉大会,ICCV。2019年。
[142] C. S. Perone, P. L. Ballester, R. C. Barros, and J. Cohen-Adad. "Unsupervised domain adaptation for medical imaging segmentation with self-ensembling". In: NeuroImage 194 (2019), pp. 1-11.
[142] C. S. Perone, P. L. Ballester, R. C. Barros, 和 J. Cohen-Adad. “基于自我集成的无监督医学影像分割域适应”。发表于:神经影像学(NeuroImage)194期(2019年),第1-11页。
[143] J. Peters, P. Bühlmann, and N. Meinshausen. "Causal inference using invariant prediction: identification and confidence intervals". In: Journal of the Royal Statistical Society, Series B (Statistical Methodology) 78.5 (2016), pp. 947-1012.
[143] J. Peters, P. Bühlmann, 和 N. Meinshausen. “利用不变预测进行因果推断:识别与置信区间”。发表于:英国皇家统计学会学报B辑(统计方法论)78卷5期(2016年),第947-1012页。
[144] F. du Pin Calmon, D. Wei, B. Vinzamuri, K. N. Ramamurthy, and K. R. Varshney. "Optimized Pre-Processing for Discrimination Prevention". In: Advances in Neural Information Processing Systems, NIPS. 2017.
[144] F. du Pin Calmon, D. Wei, B. Vinzamuri, K. N. Ramamurthy, 和 K. R. Varshney. “优化的预处理以防止歧视”。发表于:神经信息处理系统大会,NIPS。2017年。
[145] V. Piratla, P. Netrapalli, and S. Sarawagi. "Efficient Domain Generalization via Common-Specific Low-Rank Decomposition". In: International Conference on Machine Learning, ICML. 2020.
[145] V. Piratla, P. Netrapalli, 和 S. Sarawagi. “通过共性-特异性低秩分解实现高效域泛化”。发表于:国际机器学习大会,ICML。2020年。
[146] G. Pleiss, M. Raghavan, F. Wu, J. M. Kleinberg, and K. Q. Weinberger. "On Fairness and Calibration". In: Advances in Neural Information Processing Systems, NIPS. 2017.
[146] G. Pleiss, M. Raghavan, F. Wu, J. M. Kleinberg, 和 K. Q. Weinberger. “关于公平性与校准”。发表于:神经信息处理系统大会,NIPS。2017年。
[147] C. E. Priebe, D. J. Marchette, J. DeVinney, and D. A. Socolinsky. "Classification Using Class Cover Catch Digraphs". In: Journal of Classification 20.1 (2003), pp. 003-023.
[147] C. E. Priebe, D. J. Marchette, J. DeVinney, 和 D. A. Socolinsky. “基于类覆盖捕获有向图的分类方法”。发表于:分类学杂志(Journal of Classification)20卷1期(2003年),第003-023页。
[148] F. Qiao, L. Zhao, and X. Peng. "Learning to Learn Single Domain Generalization". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2020.
[148] F. Qiao, L. Zhao, 和 X. Peng. “学习单域泛化的学习方法”。发表于:计算机视觉与模式识别会议,CVPR。2020年。
[149] C. Qin, H. Zhu, T. Xu, C. Zhu, L. Jiang, E. Chen, and H. Xiong. "Enhancing Person-Job Fit for Talent Recruitment: An Ability-aware Neural Network Approach". In: SIGIR Conference on Research & Development in Information Retrieval, SIGIR. 2018.
[149] C. Qin, H. Zhu, T. Xu, C. Zhu, L. Jiang, E. Chen, 和 H. Xiong. “提升人才招聘中的人岗匹配:一种能力感知神经网络方法”。发表于:信息检索研究与开发会议,SIGIR。2018年。
[150] M. M. Rahman, C. Fookes, M. Baktashmotlagh, and S. Sridharan. "Correlation-aware adversarial domain adaptation and generalization". In: Pattern Recognition 100 (2020), p. 107124.
[150] M. M. Rahman, C. Fookes, M. Baktashmotlagh, 和 S. Sridharan. “相关感知的对抗域适应与泛化”。发表于:模式识别(Pattern Recognition)100期(2020年),文章编号107124。
[151] M. M. Rahman, C. Fookes, M. Baktashmotlagh, and S. Sridharan. "Multi-Component Image Translation for Deep Domain Generalization". In: Winter Conference on Applications of Computer Vision, WACV. 2019.
[151] M. M. Rahman, C. Fookes, M. Baktashmotlagh, 和 S. Sridharan. “用于深度域泛化的多组件图像转换”。发表于:冬季计算机视觉应用会议,WACV。2019年。
[152] H. Ramsauer, B. Schäfl, J. Lehner, P. Seidl, M. Widrich, L. Gruber, M. Holzleitner, M. Pavlović, G. K. Sandve, V. Greiff, D. Kreil, M. Kopp, G. Klambauer, J. Brandstetter, and S. Hochreiter. "Hopfield Networks is All You Need". In: International Conference on Learning Representations, ICLR. 2021.
[152] H. Ramsauer, B. Schäfl, J. Lehner, P. Seidl, M. Widrich, L. Gruber, M. Holzleitner, M. Pavlović, G. K. Sandve, V. Greiff, D. Kreil, M. Kopp, G. Klambauer, J. Brandstetter, 和 S. Hochreiter. “Hopfield网络即所需一切”。发表于:国际学习表征会议,ICLR。2021年。
[153] M. T. Ribeiro, S. Singh, and C. Guestrin. "Why Should I Trust You?": Explaining the Predictions of Any Classifier". In: International Conference on Knowledge Discovery and Data Mining, SIGKDD. 2016.
[153] M. T. Ribeiro, S. Singh, 和 C. Guestrin. “我为什么要信任你?”:解释任意分类器的预测。发表于:知识发现与数据挖掘国际会议,SIGKDD。2016年。
[154] M. T. Ribeiro, S. Singh, and C. Guestrin. "Anchors: High-Precision Model-Agnostic Explanations". In: AAAI Conference on Artificial Intelligence, AAAI. 2018.
[154] M. T. Ribeiro, S. Singh, 和 C. Guestrin. “Anchors: 高精度模型无关解释方法”。发表于:人工智能协会年会(AAAI),2018年。
[155] H. Robbins and S. Monro. "A Stochastic Approximation Method". In: The Annals of Mathematical Statistics 22.3 (1951), pp. 400-407.
[155] H. Robbins 和 S. Monro. “随机逼近方法”。发表于:《数学统计年刊》22卷3期 (1951),第400-407页。
[156] M. Robnik-Sikonja and I. Kononenko. "Explaining Classifications For Individual Instances". In: IEEE Transactions on Knowledge and Data Engineering 20.5 (2008), pp. 589-600.
[156] M. Robnik-Sikonja 和 I. Kononenko. “个体实例分类解释”。发表于:《IEEE知识与数据工程汇刊》20卷5期 (2008),第589-600页。
[157] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and F. Li. "ImageNet Large Scale Visual Recognition Challenge". In: International Journal of Computer Vision, IJCV 115.3 (2015), pp. 211-252.
[157] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, 和 F. Li. “ImageNet大规模视觉识别挑战”。发表于:《国际计算机视觉杂志》(IJCV) 115卷3期 (2015),第211-252页。
[158] B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. "LabelMe: A Database and Web-Based Tool for Image Annotation". In: International Journal of Computer Vision, IJCV 77.1-3 (2008), pp. 157-173.
[158] B. C. Russell, A. Torralba, K. P. Murphy, 和 W. T. Freeman. “LabelMe:一个图像标注数据库及基于网页的工具”。发表于:《国际计算机视觉杂志》(IJCV) 77卷1-3期 (2008),第157-173页。
[159] S. Sagawa, P. W. Koh, T. B. Hashimoto, and P. Liang. "Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization". In: International Conference on Learning Representations, ICLR. 2020.
[159] S. Sagawa, P. W. Koh, T. B. Hashimoto, 和 P. Liang. “针对群体分布变化的分布鲁棒神经网络:正则化对最坏情况泛化的重要性”。发表于:学习表征国际会议(ICLR),2020年。
[160] P. Sangkloy, N. Burnell, C. Ham, and J. Hays. "The sketchy database: learning to retrieve badly drawn bunnies". In: Transactions on Graphics, TOG 35.4 (2016), 119:1-119:12.
[160] P. Sangkloy, N. Burnell, C. Ham, 和 J. Hays. “The sketchy database:学习检索画得很差的兔子”。发表于:《图形学汇刊》(TOG) 35卷4期 (2016),119:1-119:12。
[161] J. Schmidhuber, J. Zhao, and M. A. Wiering. "Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement". In: Machine Learning 28.1 (1997), pp. 105-130.
[161] J. Schmidhuber, J. Zhao, 和 M. A. Wiering. “通过成功故事算法、自适应Levin搜索和增量自我改进调整归纳偏置”。发表于:《机器学习》28卷1期 (1997),第105-130页。
[162] R. M. Schmidt. Recurrent Neural Networks (RNNs): A gentle Introduction and Overview. 2019. arXiv: 1912.05911 [cs.LG].
[162] R. M. Schmidt. 循环神经网络(RNNs):简明介绍与概述。2019年。arXiv: 1912.05911 [cs.LG]。
[163] R. M. Schmidt, F. Schneider, and P. Hennig. Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers. 2020. arXiv: 2007.01547 [cs.LG].
[163] R. M. Schmidt, F. Schneider, 和 P. Hennig. 穿越拥挤山谷——深度学习优化器基准测试。2020年。arXiv: 2007.01547 [cs.LG]。
[164] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. "Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization". In: International Conference on Computer Vision, ICCV. 2017.
[164] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, 和 D. Batra. “Grad-CAM:基于梯度定位的深度网络视觉解释”。发表于:计算机视觉国际会议(ICCV),2017年。
[165] C. Sen, T. Hartvigsen, B. Yin, X. Kong, and E. A. Rundensteiner. "Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words?" In: Annual Meeting of the Association for Computational Linguistics, ACL. 2020.
[165] C. Sen, T. Hartvigsen, B. Yin, X. Kong, 和 E. A. Rundensteiner. “文本分类中的人类注意力图:人类和神经网络关注的是同样的词吗?”发表于:计算语言学协会年会(ACL),2020年。
[166] S. Seo, Y. Suh, D. Kim, G. Kim, J. Han, and B. Han. "Learning to Optimize Domain Specific Normalization for Domain Generalization". In: European Conference on Computer Vision,
[166] S. Seo, Y. Suh, D. Kim, G. Kim, J. Han, 和 B. Han. “学习优化领域特定归一化以实现领域泛化”。发表于:欧洲计算机视觉会议(ECCV),
[167] S. Shankar, V. Piratla, S. Chakrabarti, S. Chaudhuri, P. Jyothi, and S. Sarawagi. "Generalizing Across Domains via Cross-Gradient Training". In: International Conference on Learning Representations, ICLR. 2018.
[167] S. Shankar, V. Piratla, S. Chakrabarti, S. Chaudhuri, P. Jyothi, 和 S. Sarawagi. “通过交叉梯度训练实现跨领域泛化”。发表于:学习表征国际会议(ICLR),2018年。
[168] A. Shrikumar, P. Greenside, and A. Kundaje. "Learning Important Features Through Propagating Activation Differences". In: International Conference on Machine Learning, ICML. 2017.
[168] A. Shrikumar, P. Greenside, 和 A. Kundaje. “通过传播激活差异学习重要特征”。发表于:机器学习国际会议(ICML),2017年。
[169] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb. "Learning from Simulated and Unsupervised Images through Adversarial Training". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2017.
[169] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, 和 R. Webb. “通过对抗训练从模拟和无监督图像中学习”。发表于:计算机视觉与模式识别会议(CVPR),2017年。
[170] K. Simonyan, A. Vedaldi, and A. Zisserman. "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps". In: International Conference on Learning Representations, ICLR. 2014.
[170] K. Simonyan, A. Vedaldi, 和 A. Zisserman. “深入卷积网络:图像分类模型和显著性图的可视化”。发表于:国际学习表征会议(ICLR),2014年。
[171] K. Simonyan and A. Zisserman. "Very Deep Convolutional Networks for Large-Scale Image Recognition". In: International Conference on Learning Representations, ICLR. 2015.
[171] K. Simonyan 和 A. Zisserman. “用于大规模图像识别的非常深的卷积网络”。发表于:国际学习表征会议(ICLR),2015年。
[172] K. K. Singh and Y. J. Lee. "Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization". In: International Conference on Computer Vision, ICCV. 2017.
[172] K. K. Singh 和 Y. J. Lee. “捉迷藏:强制网络在弱监督的目标和动作定位中更加细致”。发表于:国际计算机视觉大会(ICCV),2017年。
[173] J. Snell, K. Swersky, and R. S. Zemel. "Prototypical Networks for Few-shot Learning". In: Advances in Neural Information Processing Systems, NIPS. 2017.
[173] J. Snell, K. Swersky, 和 R. S. Zemel. “原型网络用于少样本学习”。发表于:神经信息处理系统大会(NIPS),2017年。
[174] N. Somavarapu, C.-Y. Ma, and Z. Kira. Frustratingly Simple Domain Generalization via Image Stylization. 2020. arXiv: 2006.11207 [cs.CV].
[174] N. Somavarapu, C.-Y. Ma, 和 Z. Kira. 通过图像风格化实现极其简单的领域泛化。2020年。arXiv: 2006.11207 [cs.CV]。
[175] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. A. Riedmiller. "Striving for Simplicity: The All Convolutional Net". In: International Conference on Learning Representations, ICLR. 2015.
[175] J. T. Springenberg, A. Dosovitskiy, T. Brox, 和 M. A. Riedmiller. “追求简洁:全卷积网络”。发表于:国际学习表征会议(ICLR),2015年。
[176] B. K. Sriperumbudur, K. Fukumizu, A. Gretton, G. R. G. Lanckriet, and B. Schölkopf. "Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions". In: Advances in Neural Information Processing Systems, NIPS. 2009.
[176] B. K. Sriperumbudur, K. Fukumizu, A. Gretton, G. R. G. Lanckriet, 和 B. Schölkopf. “概率分布的再生核希尔伯特空间(RKHS)嵌入的核选择与可分类性”。发表于:神经信息处理系统大会(NIPS),2009年。
[177] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. "Dropout: a simple way to prevent neural networks from overfitting". In: Journal of Machine Learning Research, JMLR 15.1 (2014), pp. 1929-1958.
[177] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, 和 R. Salakhutdinov. “Dropout:防止神经网络过拟合的简单方法”。发表于:机器学习研究期刊(JMLR)15.1 (2014), 页码1929-1958。
[178] P. Stock and M. Cissé. "ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases". In: European Conference on Computer Vision, ECCV. 2018.
[178] P. Stock 和 M. Cissé. “卷积网络与ImageNet超越准确率:理解错误与揭示偏差”。发表于:欧洲计算机视觉大会(ECCV),2018年。
[179] B. Sun and K. Saenko. "Deep CORAL: Correlation Alignment for Deep Domain Adaptation". In: European Conference on Computer Vision, ECCV. 2016.
[179] B. Sun 和 K. Saenko. “Deep CORAL:用于深度领域自适应的相关性对齐”。发表于:欧洲计算机视觉大会(ECCV),2016年。
[180] G. Sun, S. Khan, W. Li, H. Cholakkal, F. Khan, and L. V. Gool. "Fixing Localization Errors to Improve Image Classification". In: European Conference on Computer Vision, ECCV. 2020.
[180] G. Sun, S. Khan, W. Li, H. Cholakkal, F. Khan, 和 L. V. Gool. “修正定位错误以提升图像分类性能”。发表于:欧洲计算机视觉大会(ECCV),2020年。
[181] M. Sundararajan, A. Taly, and Q. Yan. "Axiomatic Attribution for Deep Networks". In: International Conference on Machine Learning, ICML. 2017.
[181] M. Sundararajan, A. Taly, 和 Q. Yan. “深度网络的公理归因方法”。发表于:国际机器学习大会(ICML),2017年。
[182] S. Tan, R. Caruana, G. Hooker, P. Koch, and A. Gordo. Learning Global Additive Explanations for Neural Nets Using Model Distillation. 2018. arXiv: 1801.08640 [stat.ML].
[182] S. Tan, R. Caruana, G. Hooker, P. Koch, 和 A. Gordo. 使用模型蒸馏学习神经网络的全局加性解释。2018年。arXiv: 1801.08640 [stat.ML]。
[183] S. Thrun and L. Y. Pratt. Learning to Learn. Springer, 1998.
[183] S. Thrun 和 L. Y. Pratt. 学习如何学习。施普林格出版社,1998年。
[184] I. O. Tolstikhin, O. Bousquet, S. Gelly, and B. Schölkopf. "Wasserstein Auto-Encoders". In: International Conference on Learning Representations, ICLR. 2018.
[184] I. O. Tolstikhin, O. Bousquet, S. Gelly, 和 B. Schölkopf. “Wasserstein自编码器”。发表于:国际学习表征会议(ICLR),2018年。
[185] J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler. "Efficient object localization using Convolutional Networks". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2015.
[185] J. Tompson, R. Goroshin, A. Jain, Y. LeCun, 和 C. Bregler. “使用卷积网络实现高效目标定位”。发表于:计算机视觉与模式识别会议(CVPR),2015年。
[186] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. "Adversarial Discriminative Domain Adaptation". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2017.
[186] E. Tzeng, J. Hoffman, K. Saenko, 和 T. Darrell. “对抗判别域适应(Adversarial Discriminative Domain Adaptation)”. 载于:计算机视觉与模式识别会议(CVPR),2017年。
[187] V. Vapnik. Statistical Learning Theory. 1998.
[187] V. Vapnik. 统计学习理论(Statistical Learning Theory). 1998年。
[188] K. R. Varshney and H. Alemzadeh. "On the Safety of Machine Learning: Cyber-Physical Systems, Decision Sciences, and Data Products". In: Big Data 5.3 (2017), pp. 246-255.
[188] K. R. Varshney 和 H. Alemzadeh. “关于机器学习安全性:网络物理系统、决策科学与数据产品”. 载于:《大数据》(Big Data)5卷3期(2017年),第246-255页。
[189] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. "Attention is All you Need". In: Advances in Neural Information Processing Systems, NIPS. 2017.
[189] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, 和 I. Polosukhin. “注意力机制即一切(Attention is All you Need)”. 载于:神经信息处理系统进展(NIPS),2017年。
[190] H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan. "Deep Hashing Network for Unsupervised Domain Adaptation". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2017.
[190] H. Venkateswara, J. Eusebio, S. Chakraborty, 和 S. Panchanathan. “用于无监督域适应的深度哈希网络(Deep Hashing Network for Unsupervised Domain Adaptation)”. 载于:计算机视觉与模式识别会议(CVPR),2017年。
[191] G. Volk, S. Müller, A. von Bernuth, D. Hospach, and O. Bringmann. "Towards Robust CNN-based Object Detection through Augmentation with Synthetic Rain Variations". In: International Conference on Intelligent Transportation Systems Conference, ITSC. 2019.
[191] G. Volk, S. Müller, A. von Bernuth, D. Hospach, 和 O. Bringmann. “通过合成雨变换增强实现基于卷积神经网络的鲁棒目标检测”. 载于:智能交通系统国际会议(ITSC),2019年。
[192] R. Volpi and V. Murino. "Addressing Model Vulnerability to Distributional Shifts Over Image Transformation Sets". In: International Conference on Computer Vision, ICCV. 2019.
[192] R. Volpi 和 V. Murino. “解决模型对图像变换集分布偏移的脆弱性”. 载于:国际计算机视觉会议(ICCV),2019年。
[193] R. Volpi, H. Namkoong, O. Sener, J. C. Duchi, V. Murino, and S. Savarese. "Generalizing to Unseen Domains via Adversarial Data Augmentation". In: Advances in Neural Information Processing Systems, NeurIPS. 2018.
[193] R. Volpi, H. Namkoong, O. Sener, J. C. Duchi, V. Murino, 和 S. Savarese. “通过对抗性数据增强实现对未见域的泛化”. 载于:神经信息处理系统进展(NeurIPS),2018年。
[194] H. Wang, Z. He, Z. C. Lipton, and E. P. Xing. "Learning Robust Representations by Projecting Superficial Statistics Out". In: International Conference on Learning Representations, ICLR. 2019.
[194] H. Wang, Z. He, Z. C. Lipton, 和 E. P. Xing. “通过投影去除表面统计量学习鲁棒表示”. 载于:国际学习表征会议(ICLR),2019年。
[195] Y. Wang, H. Li, and A. C. Kot. "Heterogeneous Domain Generalization Via Domain Mixup". In: International Conference on Acoustics, Speech and Signal Processing, ICASSP. 2020.
[195] Y. Wang, H. Li, 和 A. C. Kot. “通过域混合实现异构域泛化”. 载于:国际声学、语音与信号处理会议(ICASSP),2020年。
[196] S. Wiegreffe and Y. Pinter. "Attention is not not Explanation". In: Conference on Empirical Methods in Natural Language Processing, EMNLP. 2019.
[196] S. Wiegreffe 和 Y. Pinter. “注意力机制并非解释”. 载于:自然语言处理实证方法会议(EMNLP),2019年。
[197] B. E. Woodworth, S. Gunasekar, M. I. Ohannessian, and N. Srebro. "Learning Non-Discriminatory Predictors". In: Conference on Learning Theory, COLT. 2017.
[197] B. E. Woodworth, S. Gunasekar, M. I. Ohannessian, 和 N. Srebro. “学习非歧视性预测器”. 载于:学习理论会议(COLT),2017年。
[198] C. Wu and E. G. Tabak. Prototypal Analysis and Prototypal Regression. 2017. arXiv: 1701. 08916 [stat.ML].
[198] C. Wu 和 E. G. Tabak. 原型分析与原型回归(Prototypal Analysis and Prototypal Regression). 2017年. arXiv: 1701.08916 [stat.ML].
[199] N. Xie, G. Ras, M. van Gerven, and D. Doran. Explainable Deep Learning: A Field Guide for the Uninitiated. 2020. arXiv: 2004.14545 [cs.LG].
[199] N. Xie, G. Ras, M. van Gerven, 和 D. Doran. 可解释深度学习:初学者指南(Explainable Deep Learning: A Field Guide for the Uninitiated). 2020年. arXiv: 2004.14545 [cs.LG].
[200] M. Xu, J. Zhang, B. Ni, T. Li, C. Wang, Q. Tian, and W. Zhang. "Adversarial Domain Adaptation with Domain Mixup". In: AAAI Conference on Artificial Intelligence, AAAI. 2020.
[200] M. Xu, J. Zhang, B. Ni, T. Li, C. Wang, Q. Tian, 和 W. Zhang. “基于域混合的对抗域适应”. 载于:美国人工智能协会会议(AAAI),2020年。
[201] W. Xu, Y. Xian, J. Wang, B. Schiele, and Z. Akata. "Attribute Prototype Network for Zero-Shot Learning". In: Advances in Neural Information Processing Systems, NeurIPS. 2020.
[201] W. Xu, Y. Xian, J. Wang, B. Schiele, 和 Z. Akata. “零样本学习的属性原型网络”. 载于:神经信息处理系统进展(NeurIPS),2020年。
[202] M. Yamada, L. Sigal, and M. Raptis. "No Bias Left behind: Covariate Shift Adaptation for Discriminative 3D Pose Estimation". In: European Conference on Computer Vision, ECCV. 2012.
[202] M. Yamada, L. Sigal, 和 M. Raptis. “无偏见遗留:判别式三维姿态估计的协变量偏移适应”。载于:欧洲计算机视觉大会(ECCV),2012年。
[203] S. Yan, H. Song, N. Li, L. Zou, and L. Ren. Improve Unsupervised Domain Adaptation with Mixup Training. 2020. arXiv: 2001.00677 [stat.ML].
[203] S. Yan, H. Song, N. Li, L. Zou, 和 L. Ren. 通过混合训练提升无监督领域适应。2020年。arXiv: 2001.00677 [stat.ML]。
[204] M. B. Zafar, I. Valera, M. Gomez-Rodriguez, and K. P. Gummadi. "Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment". In: International Conference on World Wide Web, WWW. 2017.
[204] M. B. Zafar, I. Valera, M. Gomez-Rodriguez, 和 K. P. Gummadi. “超越差别对待与差别影响的公平性:无差别误用的分类学习”。载于:国际万维网大会(WWW),2017年。
[205] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M. Yang, and L. Shao. "CycleISP: Real Image Restoration via Improved Data Synthesis". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2020.
[205] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M. Yang, 和 L. Shao. “CycleISP:通过改进数据合成实现真实图像恢复”。载于:计算机视觉与模式识别会议(CVPR),2020年。
[206] M. D. Zeiler and R. Fergus. "Visualizing and Understanding Convolutional Networks". In: European Conference on Computer Vision, ECCV. 2014.
[206] M. D. Zeiler 和 R. Fergus. “卷积网络的可视化与理解”。载于:欧洲计算机视觉大会(ECCV),2014年。
[207] R. Zellers, Y. Bisk, A. Farhadi, and Y. Choi. "From Recognition to Cognition: Visual Commonsense Reasoning". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2019.
[207] R. Zellers, Y. Bisk, A. Farhadi, 和 Y. Choi. “从识别到认知:视觉常识推理”。载于:计算机视觉与模式识别会议(CVPR),2019年。
[208] R. S. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. "Learning Fair Representations". In: International Conference on Machine Learning, ICML. 2013.
[208] R. S. Zemel, Y. Wu, K. Swersky, T. Pitassi, 和 C. Dwork. “学习公平表示”。载于:国际机器学习大会(ICML),2013年。
[209] X. Zeng, W. Ouyang, M. Wang, and X. Wang. "Deep Learning of Scene-Specific Classifier for Pedestrian Detection". In: European Conference on Computer Vision, ECCV. 2014.
[209] X. Zeng, W. Ouyang, M. Wang, 和 X. Wang. “行人检测的场景特定分类器深度学习”。载于:欧洲计算机视觉大会(ECCV),2014年。
[210] H. Zhang, M. Cissé, Y. N. Dauphin, and D. Lopez-Paz. "mixup: Beyond Empirical Risk Minimization". In: International Conference on Learning Representations, ICLR. 2018.
[210] H. Zhang, M. Cissé, Y. N. Dauphin, 和 D. Lopez-Paz. “mixup:超越经验风险最小化”。载于:国际表征学习会议(ICLR),2018年。
[211] L. Zhang, X. Wang, D. Yang, T. Sanford, S. Harmon, B. Turkbey, H. Roth, A. Myronenko, D. Xu, and Z. Xu. When Unseen Domain Generalization is Unnecessary? Rethinking Data Augmentation. 2019. arXiv: 1906.03347 [cs.CV].
[211] L. Zhang, X. Wang, D. Yang, T. Sanford, S. Harmon, B. Turkbey, H. Roth, A. Myronenko, D. Xu, 和 Z. Xu. 何时无需未见域泛化?重新思考数据增强。2019年。arXiv: 1906.03347 [cs.CV]。
[212] M. Zhang, H. Marklund, N. Dhawan, A. Gupta, S. Levine, and C. Finn. Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift. 2020. arXiv: 2007.02931 [cs.LG].
[212] M. Zhang, H. Marklund, N. Dhawan, A. Gupta, S. Levine, 和 C. Finn. 自适应风险最小化:应对群体偏移的元学习方法。2020年。arXiv: 2007.02931 [cs.LG]。
[213] Q. Zhang, R. Cao, F. Shi, Y. N. Wu, and S. Zhu. "Interpreting CNN Knowledge via an Explanatory Graph". In: AAAI Conference on Artificial Intelligence, AAAI. 2018.
[213] Q. Zhang, R. Cao, F. Shi, Y. N. Wu, 和 S. Zhu. “通过解释图解读卷积神经网络知识”。载于:美国人工智能协会会议(AAAI),2018年。
[214] Q. Zhang, R. Cao, Y. N. Wu, and S. Zhu. "Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning". In: AAAI Conference on Artificial Intelligence, AAAI. 2017.
[214] Q. Zhang, R. Cao, Y. N. Wu, 和 S. Zhu. “通过多次学习在卷积网络上构建可解释的部件图”。载于:美国人工智能协会会议(AAAI),2017年。
[215] Q. Zhang, Y. Yang, H. Ma, and Y. N. Wu. "Interpreting CNNs via Decision Trees". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2019.
[215] Q. Zhang, Y. Yang, H. Ma, 和 Y. N. Wu. “通过决策树解读卷积神经网络”。载于:计算机视觉与模式识别会议(CVPR),2019年。
[216] Y. Zhao, M. K. Hryniewicki, F. Cheng, B. Fu, and X. Zhu. "Employee Turnover Prediction with Machine Learning: A Reliable Approach". In: Intelligent Systems Conference, IntelliSys. 2018.
[216] Y. Zhao, M. K. Hryniewicki, F. Cheng, B. Fu, 和 X. Zhu. “员工流失预测的机器学习可靠方法”。载于:智能系统会议(IntelliSys),2018年。
[217] B. Zhou, A. Khosla, À. Lapedriza, A. Oliva, and A. Torralba. "Learning Deep Features for Discriminative Localization". In: Conference on Computer Vision and Pattern Recognition, CVPR. 2016.
[217] B. Zhou, A. Khosla, À. Lapedriza, A. Oliva, 和 A. Torralba. “学习判别性定位的深度特征”。载于:计算机视觉与模式识别会议(CVPR),2016年。
[218] K. Zhou, Y. Yang, T. Hospedales, and T. Xiang. "Learning to Generate Novel Domains for Domain Generalization". In: European Conference on Computer Vision, ECCV. 2020.
[218] K. Zhou, Y. Yang, T. Hospedales, 和 T. Xiang. “学习生成新颖领域以实现领域泛化”。载于欧洲计算机视觉大会(ECCV),2020年。
[219] K. Zhou, Y. Yang, T. M. Hospedales, and T. Xiang. "Deep Domain-Adversarial Image Generation for Domain Generalisation". In: AAAI Conference on Artificial Intelligence, AAAI. 2020.
[219] K. Zhou, Y. Yang, T. M. Hospedales, 和 T. Xiang. “用于领域泛化的深度领域对抗图像生成”。载于美国人工智能协会会议(AAAI),2020年。
[220] L. M. Zintgraf, T. S. Cohen, T. Adel, and M. Welling. "Visualizing Deep Neural Network Decisions: Prediction Difference Analysis". In: International Conference on Learning Representations, ICLR. 2017.
[220] L. M. Zintgraf, T. S. Cohen, T. Adel, 和 M. Welling. “深度神经网络决策的可视化:预测差异分析”。载于国际学习表征会议(ICLR),2017年。
[221] A. Zunino, S. A. Bargal, R. Volpi, M. Sameki, J. Zhang, S. Sclaroff, V. Murino, and K. Saenko. Explainable Deep Classification Models for Domain Generalization. 2020. arXiv: 2003.06498 [cs.CV].
[221] A. Zunino, S. A. Bargal, R. Volpi, M. Sameki, J. Zhang, S. Sclaroff, V. Murino, 和 K. Saenko. 可解释的深度分类模型用于领域泛化。2020年。arXiv: 2003.06498 [cs.CV]。
Since Table 5.2 only shows the average performance for the respective dataset, we provide the full experimental results here for each of the domains across datasets. The results for the other algorithms besides RSC, DivCAM, and D-TRANSFORMERS are taken from DomAINBED.
鉴于表5.2仅展示了各数据集的平均性能,我们在此提供跨数据集各领域的完整实验结果。除RSC、DivCAM和D-TRANSFORMERS外,其他算法的结果均取自DomAINBED。
| Algorithm | C | L | S | V | \( \mathbf{{Avg}.} \) |
| ERM | \( {97.7} \pm {0.4} \) | \( {64.3} \pm {0.9} \) | \( {73.4} \pm {0.5} \) | \( {74.6} \pm {1.3} \) | 77.5 |
| IRM | \( {98.6} \pm {0.1} \) | \( {64.9} \pm {0.9} \) | \( {73.4} \pm {0.6} \) | \( {77.3} \pm {0.9} \) | 78.5 |
| GroupDRO | \( {97.3} \pm {0.3} \) | \( {63.4} \pm {0.9} \) | \( {69.5} \pm {0.8} \) | \( {76.7} \pm {0.7} \) | 76.7 |
| Mixup | \( {98.3} \pm {0.6} \) | \( {64.8} \pm {1.0} \) | \( {72.1} \pm {0.5} \) | \( {74.3} \pm {0.8} \) | 77.4 |
| MLDG | \( {97.4} \pm {0.2} \) | \( {65.2} \pm {0.7} \) | \( {71.0} \pm {1.4} \) | \( {75.3} \pm {1.0} \) | 77.2 |
| CORAL | \( {98.3} \pm {0.1} \) | \( {66.1} \pm {1.2} \) | \( {73.4} \pm {0.3} \) | \( {77.5} \pm {1.2} \) | 78.8 |
| MMD | \( {97.7} \pm {0.1} \) | \( {64.0} \pm {1.1} \) | \( {72.8} \pm {0.2} \) | \( {75.3} \pm {3.3} \) | 77.5 |
| DANN | \( {99.0} \pm {0.3} \) | \( {65.1} \pm {1.4} \) | \( {73.1} \pm {0.3} \) | \( {77.2} \pm {0.6} \) | 78.6 |
| CDANN | \( {97.1} \pm {0.3} \) | \( {65.1} \pm {1.2} \) | \( {70.7} \pm {0.8} \) | \( {77.1} \pm {1.5} \) | 77.5 |
| MTL | \( {97.8} \pm {0.4} \) | \( {64.3} \pm {0.3} \) | \( {71.5} \pm {0.7} \) | \( {75.3} \pm {1.7} \) | 77.2 |
| SagNet | \( {97.9} \pm {0.4} \) | \( {64.5} \pm {0.5} \) | \( {71.4} \pm {1.3} \) | \( {77.5} \pm {0.5} \) | 77.8 |
| ARM | \( {98.7} \pm {0.2} \) | \( {63.6} \pm {0.7} \) | \( {71.3} \pm {1.2} \) | \( {76.7} \pm {0.6} \) | 77.6 |
| VREx | \( {98.4} \pm {0.3} \) | \( {64.4} \pm {1.4} \) | \( {74.1} \pm {0.4} \) | \( {76.2} \pm {1.3} \) | 78.3 |
| RSC | \( {97.9} \pm {0.1} \) | \( {62.5} \pm {0.7} \) | \( {72.3} \pm {1.2} \) | \( {75.6} \pm {0.8} \) | 77.1 |
| DivCAM-S | \( {98.7} \pm {0.1} \) | \( {64.5} \pm {1.1} \) | \( {72.5} \pm {0.7} \) | \( {75.5} \pm {0.4} \) | 77.8 |
| D-TRANSFORMERS | \( {98.1} \pm {0.2} \) | \( {65.8} \pm {0.6} \) | \( {71.7} \pm {0.4} \) | \( {79.2} \pm {1.3} \) | 78.7 |
| ERM* | \( {97.6} \pm {0.3} \) | \( {67.9} \pm {0.7} \) | \( {70.9} \pm {0.2} \) | \( {74.0} \pm {0.6} \) | 77.6 |
| IRM* | \( {97.3} \pm {0.2} \) | \( {66.7} \pm {0.1} \) | \( {71.0} \pm {2.3} \) | \( {72.8} \pm {0.4} \) | 76.9 |
| GroupDRO* | \( {97.7} \pm {0.2} \) | \( {65.9} \pm {0.2} \) | \( {72.8} \pm {0.8} \) | \( {73.4} \pm {1.3} \) | 77.4 |
| Mixup* | \( {97.8} \pm {0.4} \) | \( {67.2} \pm {0.4} \) | \( {71.5} \pm {0.2} \) | \( {75.7} \pm {0.6} \) | 78.1 |
| MLDG* | \( {97.1} \pm {0.5} \) | \( {66.6} \pm {0.5} \) | \( {71.5} \pm {0.1} \) | \( {75.0} \pm {0.9} \) | 77.5 |
| CORAL* | \( {97.3} \pm {0.2} \) | \( {67.5} \pm {0.6} \) | \( {71.6} \pm {0.6} \) | \( {74.5} \pm {0.0} \) | 77.7 |
| MMD* | \( {98.8} \pm {0.0} \) | \( {66.4} \pm {0.4} \) | \( {70.8} \pm {0.5} \) | \( {75.6} \pm {0.4} \) | 77.9 |
| DANN* | \( {99.0} \pm {0.2} \) | \( {66.3} \pm {1.2} \) | \( {73.4} \pm {1.4} \) | \( {80.1} \pm {0.5} \) | 79.7 |
| CDANN* | \( {98.2} \pm {0.1} \) | \( {68.8} \pm {0.5} \) | \( {74.3} \pm {0.6} \) | \( {78.1} \pm {0.5} \) | 79.9 |
| MTL* | \( {97.9} \pm {0.7} \) | \( {66.1} \pm {0.7} \) | \( {72.0} \pm {0.4} \) | \( {74.9} \pm {1.1} \) | 77.7 |
| SagNet* | \( {97.4} \pm {0.3} \) | \( {66.4} \pm {0.4} \) | \( {71.6} \pm {0.1} \) | \( {75.0} \pm {0.8} \) | 77.6 |
| ARM* | \( {97.6} \pm {0.6} \) | \( {66.5} \pm {0.3} \) | \( {72.7} \pm {0.6} \) | \( {74.4} \pm {0.7} \) | 77.8 |
| VREx* | \( {98.4} \pm {0.2} \) | \( {66.4} \pm {0.7} \) | \( {72.8} \pm {0.1} \) | \( {75.0} \pm {1.4} \) | 78.1 |
| RSC* | \( {98.0} \pm {0.4} \) | \( {67.2} \pm {0.3} \) | \( {70.3} \pm {1.3} \) | \( {75.6} \pm {0.4} \) | 77.8 |
| DIVCAM-S* | \( {98.0} \pm {0.5} \) | \( {66.1} \pm {0.3} \) | \( {72.0} \pm {1.0} \) | \( {76.4} \pm {0.7} \) | 78.1 |
| D-TRANSFORMERS* | \( {98.1} \pm {0.3} \) | \( {66.5} \pm {0.8} \) | \( {72.4} \pm {0.9} \) | \( {74.0} \pm {0.4} \) | 77.7 |
| 算法 | C | L | S | V | \( \mathbf{{Avg}.} \) |
| 经验风险最小化(ERM) | \( {97.7} \pm {0.4} \) | \( {64.3} \pm {0.9} \) | \( {73.4} \pm {0.5} \) | \( {74.6} \pm {1.3} \) | 77.5 |
| 不变风险最小化(IRM) | \( {98.6} \pm {0.1} \) | \( {64.9} \pm {0.9} \) | \( {73.4} \pm {0.6} \) | \( {77.3} \pm {0.9} \) | 78.5 |
| 群体分布鲁棒优化(GroupDRO) | \( {97.3} \pm {0.3} \) | \( {63.4} \pm {0.9} \) | \( {69.5} \pm {0.8} \) | \( {76.7} \pm {0.7} \) | 76.7 |
| 混合增强(Mixup) | \( {98.3} \pm {0.6} \) | \( {64.8} \pm {1.0} \) | \( {72.1} \pm {0.5} \) | \( {74.3} \pm {0.8} \) | 77.4 |
| 元学习领域泛化(MLDG) | \( {97.4} \pm {0.2} \) | \( {65.2} \pm {0.7} \) | \( {71.0} \pm {1.4} \) | \( {75.3} \pm {1.0} \) | 77.2 |
| 相关对齐(CORAL) | \( {98.3} \pm {0.1} \) | \( {66.1} \pm {1.2} \) | \( {73.4} \pm {0.3} \) | \( {77.5} \pm {1.2} \) | 78.8 |
| 最大均值差异(MMD) | \( {97.7} \pm {0.1} \) | \( {64.0} \pm {1.1} \) | \( {72.8} \pm {0.2} \) | \( {75.3} \pm {3.3} \) | 77.5 |
| 领域对抗神经网络(DANN) | \( {99.0} \pm {0.3} \) | \( {65.1} \pm {1.4} \) | \( {73.1} \pm {0.3} \) | \( {77.2} \pm {0.6} \) | 78.6 |
| 条件领域对抗神经网络(CDANN) | \( {97.1} \pm {0.3} \) | \( {65.1} \pm {1.2} \) | \( {70.7} \pm {0.8} \) | \( {77.1} \pm {1.5} \) | 77.5 |
| 多任务学习(MTL) | \( {97.8} \pm {0.4} \) | \( {64.3} \pm {0.3} \) | \( {71.5} \pm {0.7} \) | \( {75.3} \pm {1.7} \) | 77.2 |
| 语义-外观分解网络(SagNet) | \( {97.9} \pm {0.4} \) | \( {64.5} \pm {0.5} \) | \( {71.4} \pm {1.3} \) | \( {77.5} \pm {0.5} \) | 77.8 |
| 自适应风险最小化(ARM) | \( {98.7} \pm {0.2} \) | \( {63.6} \pm {0.7} \) | \( {71.3} \pm {1.2} \) | \( {76.7} \pm {0.6} \) | 77.6 |
| 方差风险扩展(VREx) | \( {98.4} \pm {0.3} \) | \( {64.4} \pm {1.4} \) | \( {74.1} \pm {0.4} \) | \( {76.2} \pm {1.3} \) | 78.3 |
| 表示稀疏性约束(RSC) | \( {97.9} \pm {0.1} \) | \( {62.5} \pm {0.7} \) | \( {72.3} \pm {1.2} \) | \( {75.6} \pm {0.8} \) | 77.1 |
| 多样性类激活映射-小(DivCAM-S) | \( {98.7} \pm {0.1} \) | \( {64.5} \pm {1.1} \) | \( {72.5} \pm {0.7} \) | \( {75.5} \pm {0.4} \) | 77.8 |
| D-变换器(D-TRANSFORMERS) | \( {98.1} \pm {0.2} \) | \( {65.8} \pm {0.6} \) | \( {71.7} \pm {0.4} \) | \( {79.2} \pm {1.3} \) | 78.7 |
| 经验风险最小化(ERM)* | \( {97.6} \pm {0.3} \) | \( {67.9} \pm {0.7} \) | \( {70.9} \pm {0.2} \) | \( {74.0} \pm {0.6} \) | 77.6 |
| 不变风险最小化(IRM)* | \( {97.3} \pm {0.2} \) | \( {66.7} \pm {0.1} \) | \( {71.0} \pm {2.3} \) | \( {72.8} \pm {0.4} \) | 76.9 |
| 群体分布鲁棒优化(GroupDRO)* | \( {97.7} \pm {0.2} \) | \( {65.9} \pm {0.2} \) | \( {72.8} \pm {0.8} \) | \( {73.4} \pm {1.3} \) | 77.4 |
| 混合增强(Mixup)* | \( {97.8} \pm {0.4} \) | \( {67.2} \pm {0.4} \) | \( {71.5} \pm {0.2} \) | \( {75.7} \pm {0.6} \) | 78.1 |
| 元学习领域泛化(MLDG)* | \( {97.1} \pm {0.5} \) | \( {66.6} \pm {0.5} \) | \( {71.5} \pm {0.1} \) | \( {75.0} \pm {0.9} \) | 77.5 |
| 相关对齐(CORAL)* | \( {97.3} \pm {0.2} \) | \( {67.5} \pm {0.6} \) | \( {71.6} \pm {0.6} \) | \( {74.5} \pm {0.0} \) | 77.7 |
| 最大均值差异(MMD)* | \( {98.8} \pm {0.0} \) | \( {66.4} \pm {0.4} \) | \( {70.8} \pm {0.5} \) | \( {75.6} \pm {0.4} \) | 77.9 |
| 领域对抗神经网络(DANN)* | \( {99.0} \pm {0.2} \) | \( {66.3} \pm {1.2} \) | \( {73.4} \pm {1.4} \) | \( {80.1} \pm {0.5} \) | 79.7 |
| 条件领域对抗神经网络(CDANN)* | \( {98.2} \pm {0.1} \) | \( {68.8} \pm {0.5} \) | \( {74.3} \pm {0.6} \) | \( {78.1} \pm {0.5} \) | 79.9 |
| 多任务学习(MTL)* | \( {97.9} \pm {0.7} \) | \( {66.1} \pm {0.7} \) | \( {72.0} \pm {0.4} \) | \( {74.9} \pm {1.1} \) | 77.7 |
| 语义-外观分解网络(SagNet)* | \( {97.4} \pm {0.3} \) | \( {66.4} \pm {0.4} \) | \( {71.6} \pm {0.1} \) | \( {75.0} \pm {0.8} \) | 77.6 |
| 自适应风险最小化(ARM)* | \( {97.6} \pm {0.6} \) | \( {66.5} \pm {0.3} \) | \( {72.7} \pm {0.6} \) | \( {74.4} \pm {0.7} \) | 77.8 |
| 方差风险扩展(VREx)* | \( {98.4} \pm {0.2} \) | \( {66.4} \pm {0.7} \) | \( {72.8} \pm {0.1} \) | \( {75.0} \pm {1.4} \) | 78.1 |
| 表示稀疏性约束(RSC)* | \( {98.0} \pm {0.4} \) | \( {67.2} \pm {0.3} \) | \( {70.3} \pm {1.3} \) | \( {75.6} \pm {0.4} \) | 77.8 |
| 多样性类激活映射-小(DivCAM-S)* | \( {98.0} \pm {0.5} \) | \( {66.1} \pm {0.3} \) | \( {72.0} \pm {1.0} \) | \( {76.4} \pm {0.7} \) | 78.1 |
| D-变换器(D-TRANSFORMERS)* | \( {98.1} \pm {0.3} \) | \( {66.5} \pm {0.8} \) | \( {72.4} \pm {0.9} \) | \( {74.0} \pm {0.4} \) | 77.7 |
Table A.1: Domain specific performance for the VLCS dataset using training-domain validation (top) and oracle validation denoted with * (bottom). We use a ResNet-50 backbone, optimize with ADAM, and follow the distributions specified in DOMAINBED. Only RSC and our methods have been added as part of this work, the other baselines are taken from DOMAINBED.
表A.1:使用训练域验证(上方)和带*标记的oracle验证(下方)对VLCS数据集的特定领域性能。我们采用ResNet-50骨干网络,使用ADAM优化,并遵循DOMAINBED中指定的分布。仅RSC和我们的方法作为本工作新增,其他基线均取自DOMAINBED。
| Algorithm | A | C | \( \mathbf{P} \) | S | \( \mathbf{{Avg}.} \) |
| ERM | \( {84.7} \pm {0.4} \) | \( {80.8} \pm {0.6} \) | \( {97.2} \pm {0.3} \) | \( {79.3} \pm {1.0} \) | 85.5 |
| IRM | \( {84.8} \pm {1.3} \) | \( {76.4} \pm {1.1} \) | \( {96.7} \pm {0.6} \) | \( {76.1} \pm {1.0} \) | 83.5 |
| GroupDRO | \( {83.5} \pm {0.9} \) | \( {79.1} \pm {0.6} \) | \( {96.7} \pm {0.3} \) | \( {78.3} \pm {2.0} \) | 84.4 |
| Mixup | \( {86.1} \pm {0.5} \) | \( {78.9} \pm {0.8} \) | \( {97.6} \pm {0.1} \) | \( {75.8} \pm {1.8} \) | 84.6 |
| MLDG | \( {85.5} \pm {1.4} \) | \( {80.1} \pm {1.7} \) | \( {97.4} \pm {0.3} \) | \( {76.6} \pm {1.1} \) | 84.9 |
| CORAL | \( {88.3} \pm {0.2} \) | \( {80.0} \pm {0.5} \) | \( {97.5} \pm {0.3} \) | \( {78.8} \pm {1.3} \) | 86.2 |
| MMD | \( {86.1} \pm {1.4} \) | \( {79.4} \pm {0.9} \) | \( {96.6} \pm {0.2} \) | \( {76.5} \pm {0.5} \) | 84.6 |
| DANN | \( {86.4} \pm {0.8} \) | \( {77.4} \pm {0.8} \) | \( {97.3} \pm {0.4} \) | \( {73.5} \pm {2.3} \) | 83.6 |
| CDANN | \( {84.6} \pm {1.8} \) | \( {75.5} \pm {0.9} \) | \( {96.8} \pm {0.3} \) | \( {73.5} \pm {0.6} \) | 82.6 |
| MTL | \( {87.5} \pm {0.8} \) | \( {77.1} \pm {0.5} \) | \( {96.4} \pm {0.8} \) | \( {77.3} \pm {1.8} \) | 84.6 |
| SagNet | \( {87.4} \pm {1.0} \) | \( {80.7} \pm {0.6} \) | \( {97.1} \pm {0.1} \) | \( {80.0} \pm {0.4} \) | 86.3 |
| ARM | \( {86.8} \pm {0.6} \) | \( {76.8} \pm {0.5} \) | \( {97.4} \pm {0.3} \) | \( {79.3} \pm {1.2} \) | 85.1 |
| VREx | \( {86.0} \pm {1.6} \) | \( {79.1} \pm {0.6} \) | \( {96.9} \pm {0.5} \) | \( {77.7} \pm {1.7} \) | 84.9 |
| RSC | \( {85.4} \pm {0.8} \) | \( {79.7} \pm {1.8} \) | \( {97.6} \pm {0.3} \) | \( {78.2} \pm {1.2} \) | 85.2 |
| DivCAM-S | \( {86.2} \pm {1.4} \) | \( {79.1} \pm {2.2} \) | \( {97.3} \pm {0.4} \) | \( {79.2} \pm {0.1} \) | 85.4 |
| D-TRANSFORMERS | \( {86.9} \pm {0.8} \) | \( {78.2} \pm {1.7} \) | \( {96.6} \pm {0.7} \) | \( {75.1} \pm {0.5} \) | 84.2 |
| ERM* | \( {86.5} \pm {1.0} \) | \( {81.3} \pm {0.6} \) | \( {96.2} \pm {0.3} \) | \( {82.7} \pm {1.1} \) | 86.7 |
| IRM* | \( {84.2} \pm {0.9} \) | \( {79.7} \pm {1.5} \) | \( {95.9} \pm {0.4} \) | \( {78.3} \pm {2.1} \) | 84.5 |
| GroupDRO* | \( {87.5} \pm {0.5} \) | \( {82.9} \pm {0.6} \) | \( {97.1} \pm {0.3} \) | \( {81.1} \pm {1.2} \) | 87.1 |
| Mixup* | \( {87.5} \pm {0.4} \) | \( {81.6} \pm {0.7} \) | \( {97.4} \pm {0.2} \) | \( {80.8} \pm {0.9} \) | 86.8 |
| MLDG* | \( {87.0} \pm {1.2} \) | \( {82.5} \pm {0.9} \) | \( {96.7} \pm {0.3} \) | \( {81.2} \pm {0.6} \) | 86.8 |
| CORAL* | \( {86.6} \pm {0.8} \) | \( {81.8} \pm {0.9} \) | \( {97.1} \pm {0.5} \) | \( {82.7} \pm {0.6} \) | 87.1 |
| MMD* | \( {88.1} \pm {0.8} \) | \( {82.6} \pm {0.7} \) | \( {97.1} \pm {0.5} \) | \( {81.2} \pm {1.2} \) | 87.2 |
| DANN* | \( {87.0} \pm {0.4} \) | \( {80.3} \pm {0.6} \) | \( {96.8} \pm {0.3} \) | \( {76.9} \pm {1.1} \) | 85.2 |
| CDANN* | \( {87.7} \pm {0.6} \) | \( {80.7} \pm {1.2} \) | \( {97.3} \pm {0.4} \) | \( {77.6} \pm {1.5} \) | 85.8 |
| MTL* | \( {87.0} \pm {0.2} \) | \( {82.7} \pm {0.8} \) | \( {96.5} \pm {0.7} \) | \( {80.5} \pm {0.8} \) | 86.7 |
| SagNet* | \( {87.4} \pm {0.5} \) | \( {81.2} \pm {1.2} \) | \( {96.3} \pm {0.8} \) | \( {80.7} \pm {1.1} \) | 86.4 |
| ARM* | \( {85.0} \pm {1.2} \) | \( {81.4} \pm {0.2} \) | \( {95.9} \pm {0.3} \) | \( {80.9} \pm {0.5} \) | 85.8 |
| VREx* | \( {87.8} \pm {1.2} \) | \( {81.8} \pm {0.7} \) | \( {97.4} \pm {0.2} \) | \( {82.1} \pm {0.7} \) | 87.2 |
| RSC* | \( {86.0} \pm {0.7} \) | \( {81.8} \pm {0.9} \) | \( {96.8} \pm {0.7} \) | \( {80.4} \pm {0.5} \) | 86.2 |
| DivCAM-S* | \( {86.5} \pm {0.4} \) | \( {83.0} \pm {0.5} \) | \( {97.2} \pm {0.3} \) | \( {82.2} \pm {0.1} \) | 87.2 |
| D-TRANSFORMERS* | \( {87.8} \pm {0.6} \) | \( {81.6} \pm {0.3} \) | \( {97.2} \pm {0.5} \) | \( {80.9} \pm {0.5} \) | 86.9 |
| 算法 | A | C | \( \mathbf{P} \) | S | \( \mathbf{{Avg}.} \) |
| 经验风险最小化(ERM) | \( {84.7} \pm {0.4} \) | \( {80.8} \pm {0.6} \) | \( {97.2} \pm {0.3} \) | \( {79.3} \pm {1.0} \) | 85.5 |
| 不变风险最小化(IRM) | \( {84.8} \pm {1.3} \) | \( {76.4} \pm {1.1} \) | \( {96.7} \pm {0.6} \) | \( {76.1} \pm {1.0} \) | 83.5 |
| 群体分布鲁棒优化(GroupDRO) | \( {83.5} \pm {0.9} \) | \( {79.1} \pm {0.6} \) | \( {96.7} \pm {0.3} \) | \( {78.3} \pm {2.0} \) | 84.4 |
| 混合增强(Mixup) | \( {86.1} \pm {0.5} \) | \( {78.9} \pm {0.8} \) | \( {97.6} \pm {0.1} \) | \( {75.8} \pm {1.8} \) | 84.6 |
| 元学习领域泛化(MLDG) | \( {85.5} \pm {1.4} \) | \( {80.1} \pm {1.7} \) | \( {97.4} \pm {0.3} \) | \( {76.6} \pm {1.1} \) | 84.9 |
| 相关对齐(CORAL) | \( {88.3} \pm {0.2} \) | \( {80.0} \pm {0.5} \) | \( {97.5} \pm {0.3} \) | \( {78.8} \pm {1.3} \) | 86.2 |
| 最大均值差异(MMD) | \( {86.1} \pm {1.4} \) | \( {79.4} \pm {0.9} \) | \( {96.6} \pm {0.2} \) | \( {76.5} \pm {0.5} \) | 84.6 |
| 领域对抗神经网络(DANN) | \( {86.4} \pm {0.8} \) | \( {77.4} \pm {0.8} \) | \( {97.3} \pm {0.4} \) | \( {73.5} \pm {2.3} \) | 83.6 |
| 条件领域对抗神经网络(CDANN) | \( {84.6} \pm {1.8} \) | \( {75.5} \pm {0.9} \) | \( {96.8} \pm {0.3} \) | \( {73.5} \pm {0.6} \) | 82.6 |
| 多任务学习(MTL) | \( {87.5} \pm {0.8} \) | \( {77.1} \pm {0.5} \) | \( {96.4} \pm {0.8} \) | \( {77.3} \pm {1.8} \) | 84.6 |
| 语义-属性网络(SagNet) | \( {87.4} \pm {1.0} \) | \( {80.7} \pm {0.6} \) | \( {97.1} \pm {0.1} \) | \( {80.0} \pm {0.4} \) | 86.3 |
| 自适应风险最小化(ARM) | \( {86.8} \pm {0.6} \) | \( {76.8} \pm {0.5} \) | \( {97.4} \pm {0.3} \) | \( {79.3} \pm {1.2} \) | 85.1 |
| 方差风险扩展(VREx) | \( {86.0} \pm {1.6} \) | \( {79.1} \pm {0.6} \) | \( {96.9} \pm {0.5} \) | \( {77.7} \pm {1.7} \) | 84.9 |
| 表示自适应剪枝(RSC) | \( {85.4} \pm {0.8} \) | \( {79.7} \pm {1.8} \) | \( {97.6} \pm {0.3} \) | \( {78.2} \pm {1.2} \) | 85.2 |
| 多样性类激活映射-小型(DivCAM-S) | \( {86.2} \pm {1.4} \) | \( {79.1} \pm {2.2} \) | \( {97.3} \pm {0.4} \) | \( {79.2} \pm {0.1} \) | 85.4 |
| D-变换器(D-TRANSFORMERS) | \( {86.9} \pm {0.8} \) | \( {78.2} \pm {1.7} \) | \( {96.6} \pm {0.7} \) | \( {75.1} \pm {0.5} \) | 84.2 |
| 经验风险最小化(ERM)* | \( {86.5} \pm {1.0} \) | \( {81.3} \pm {0.6} \) | \( {96.2} \pm {0.3} \) | \( {82.7} \pm {1.1} \) | 86.7 |
| 不变风险最小化(IRM)* | \( {84.2} \pm {0.9} \) | \( {79.7} \pm {1.5} \) | \( {95.9} \pm {0.4} \) | \( {78.3} \pm {2.1} \) | 84.5 |
| 群体分布鲁棒优化(GroupDRO)* | \( {87.5} \pm {0.5} \) | \( {82.9} \pm {0.6} \) | \( {97.1} \pm {0.3} \) | \( {81.1} \pm {1.2} \) | 87.1 |
| 混合增强(Mixup)* | \( {87.5} \pm {0.4} \) | \( {81.6} \pm {0.7} \) | \( {97.4} \pm {0.2} \) | \( {80.8} \pm {0.9} \) | 86.8 |
| 元学习领域泛化(MLDG)* | \( {87.0} \pm {1.2} \) | \( {82.5} \pm {0.9} \) | \( {96.7} \pm {0.3} \) | \( {81.2} \pm {0.6} \) | 86.8 |
| 相关对齐(CORAL)* | \( {86.6} \pm {0.8} \) | \( {81.8} \pm {0.9} \) | \( {97.1} \pm {0.5} \) | \( {82.7} \pm {0.6} \) | 87.1 |
| 最大均值差异(MMD)* | \( {88.1} \pm {0.8} \) | \( {82.6} \pm {0.7} \) | \( {97.1} \pm {0.5} \) | \( {81.2} \pm {1.2} \) | 87.2 |
| 领域对抗神经网络(DANN)* | \( {87.0} \pm {0.4} \) | \( {80.3} \pm {0.6} \) | \( {96.8} \pm {0.3} \) | \( {76.9} \pm {1.1} \) | 85.2 |
| 条件领域对抗神经网络(CDANN)* | \( {87.7} \pm {0.6} \) | \( {80.7} \pm {1.2} \) | \( {97.3} \pm {0.4} \) | \( {77.6} \pm {1.5} \) | 85.8 |
| 多任务学习(MTL)* | \( {87.0} \pm {0.2} \) | \( {82.7} \pm {0.8} \) | \( {96.5} \pm {0.7} \) | \( {80.5} \pm {0.8} \) | 86.7 |
| 语义-属性网络(SagNet)* | \( {87.4} \pm {0.5} \) | \( {81.2} \pm {1.2} \) | \( {96.3} \pm {0.8} \) | \( {80.7} \pm {1.1} \) | 86.4 |
| 自适应风险最小化(ARM)* | \( {85.0} \pm {1.2} \) | \( {81.4} \pm {0.2} \) | \( {95.9} \pm {0.3} \) | \( {80.9} \pm {0.5} \) | 85.8 |
| 方差风险扩展(VREx)* | \( {87.8} \pm {1.2} \) | \( {81.8} \pm {0.7} \) | \( {97.4} \pm {0.2} \) | \( {82.1} \pm {0.7} \) | 87.2 |
| 表示自适应剪枝(RSC)* | \( {86.0} \pm {0.7} \) | \( {81.8} \pm {0.9} \) | \( {96.8} \pm {0.7} \) | \( {80.4} \pm {0.5} \) | 86.2 |
| 多样性类激活映射-小型(DivCAM-S)* | \( {86.5} \pm {0.4} \) | \( {83.0} \pm {0.5} \) | \( {97.2} \pm {0.3} \) | \( {82.2} \pm {0.1} \) | 87.2 |
| D-变换器(D-TRANSFORMERS)* | \( {87.8} \pm {0.6} \) | \( {81.6} \pm {0.3} \) | \( {97.2} \pm {0.5} \) | \( {80.9} \pm {0.5} \) | 86.9 |
Table A.2: Domain specific performance for the PACS dataset using training-domain validation (top) and oracle validation denoted with * (bottom). We use a ResNet-50 backbone, optimize with ADAM, and follow the distributions specified in DOMAINBED. Only RSC and our methods have been added as part of this work, the other baselines are taken from DOMAINBED.
表A.2:PACS数据集在训练域验证(上方)和带*标记的oracle验证(下方)下的领域特定性能。我们使用ResNet-50主干网络,采用ADAM优化器,并遵循DOMAINBED中指定的分布。仅RSC和我们的方法是本工作新增,其他基线均取自DOMAINBED。
| Algorithm | A | C | P | \( \mathbf{R} \) | \( \mathbf{{Avg}.} \) |
| ERM | \( {61.3} \pm {0.7} \) | \( {52.4} \pm {0.3} \) | \( {75.8} \pm {0.1} \) | \( {76.6} \pm {0.3} \) | 66.5 |
| IRM | \( {58.9} \pm {2.3} \) | \( {52.2} \pm {1.6} \) | \( {72.1} \pm {2.9} \) | \( {74.0} \pm {2.5} \) | 64.3 |
| GroupDRO | \( {60.4} \pm {0.7} \) | \( {52.7} \pm {1.0} \) | \( {75.0} \pm {0.7} \) | \( {76.0} \pm {0.7} \) | 66.0 |
| Mixup | \( {62.4} \pm {0.8} \) | \( {54.8} \pm {0.6} \) | \( {76.9} \pm {0.3} \) | \( {78.3} \pm {0.2} \) | 68.1 |
| MLDG | \( {61.5} \pm {0.9} \) | \( {53.2} \pm {0.6} \) | \( {75.0} \pm {1.2} \) | \( {77.5} \pm {0.4} \) | 66.8 |
| CORAL | \( {65.3} \pm {0.4} \) | \( {54.4} \pm {0.5} \) | \( {76.5} \pm {0.1} \) | \( {78.4} \pm {0.5} \) | 68.7 |
| MMD | \( {60.4} \pm {0.2} \) | \( {53.3} \pm {0.3} \) | \( {74.3} \pm {0.1} \) | \( {77.4} \pm {0.6} \) | 66.3 |
| DANN | \( {59.9} \pm {1.3} \) | \( {53.0} \pm {0.3} \) | \( {73.6} \pm {0.7} \) | \( {76.9} \pm {0.5} \) | 65.9 |
| CDANN | \( {61.5} \pm {1.4} \) | \( {50.4} \pm {2.4} \) | \( {74.4} \pm {0.9} \) | \( {76.6} \pm {0.8} \) | 65.8 |
| MTL | \( {61.5} \pm {0.7} \) | \( {52.4} \pm {0.6} \) | \( {74.9} \pm {0.4} \) | \( {76.8} \pm {0.4} \) | 66.4 |
| SagNet | \( {63.4} \pm {0.2} \) | \( {54.8} \pm {0.4} \) | \( {75.8} \pm {0.4} \) | \( {78.3} \pm {0.3} \) | 68.1 |
| ARM | \( {58.9} \pm {0.8} \) | \( {51.0} \pm {0.5} \) | \( {74.1} \pm {0.1} \) | \( {75.2} \pm {0.3} \) | 64.8 |
| VREx | \( {60.7} \pm {0.9} \) | \( {53.0} \pm {0.9} \) | \( {75.3} \pm {0.1} \) | \( {76.6} \pm {0.5} \) | 66.4 |
| RSC | \( {60.7} \pm {1.4} \) | \( {51.4} \pm {0.3} \) | \( {74.8} \pm {1.1} \) | \( {75.1} \pm {1.3} \) | 65.5 |
| DivCAM-S | \( {59.5} \pm {0.3} \) | \( {49.7} \pm {0.1} \) | \( {75.4} \pm {0.7} \) | \( {76.2} \pm {0.2} \) | 65.2 |
| ERM* | \( {61.7} \pm {0.7} \) | \( {53.4} \pm {0.3} \) | \( {74.1} \pm {0.4} \) | \( {76.2} \pm {0.6} \) | 66.4 |
| IRM* | \( {56.4} \pm {3.2} \) | \( {51.2} \pm {2.3} \) | \( {71.7} \pm {2.7} \) | \( {72.7} \pm {2.7} \) | 63.0 |
| GroupDRO* | \( {60.5} \pm {1.6} \) | \( {53.1} \pm {0.3} \) | \( {75.5} \pm {0.3} \) | \( {75.9} \pm {0.7} \) | 66.2 |
| Mixup* | \( {63.5} \pm {0.2} \) | \( {54.6} \pm {0.4} \) | \( {76.0} \pm {0.3} \) | \( {78.0} \pm {0.7} \) | 68.0 |
| MLDG* | \( {60.5} \pm {0.7} \) | \( {54.2} \pm {0.5} \) | \( {75.0} \pm {0.2} \) | \( {76.7} \pm {0.5} \) | 66.6 |
| CORAL* | \( {64.8} \pm {0.8} \) | \( {54.1} \pm {0.9} \) | \( {76.5} \pm {0.4} \) | \( {78.2} \pm {0.4} \) | 68.4 |
| MMD* | \( {60.4} \pm {1.0} \) | \( {53.4} \pm {0.5} \) | \( {74.9} \pm {0.1} \) | \( {76.1} \pm {0.7} \) | 66.2 |
| DANN* | \( {60.6} \pm {1.4} \) | \( {51.8} \pm {0.7} \) | \( {73.4} \pm {0.5} \) | \( {75.5} \pm {0.9} \) | 65.3 |
| CDANN* | \( {57.9} \pm {0.2} \) | \( {52.1} \pm {1.2} \) | \( {74.9} \pm {0.7} \) | \( {76.2} \pm {0.2} \) | 65.3 |
| MTL* | \( {60.7} \pm {0.8} \) | \( {53.5} \pm {1.3} \) | \( {75.2} \pm {0.6} \) | \( {76.6} \pm {0.6} \) | 66.5 |
| SagNet* | \( {62.7} \pm {0.5} \) | \( {53.6} \pm {0.5} \) | \( {76.0} \pm {0.3} \) | \( {77.8} \pm {0.1} \) | 67.5 |
| ARM* | \( {58.8} \pm {0.5} \) | \( {51.8} \pm {0.7} \) | \( {74.0} \pm {0.1} \) | \( {74.4} \pm {0.2} \) | 64.8 |
| VREx* | \( {59.6} \pm {1.0} \) | \( {53.3} \pm {0.3} \) | \( {73.2} \pm {0.5} \) | \( {76.6} \pm {0.4} \) | 65.7 |
| RSC* | \( {61.7} \pm {0.8} \) | \( {53.0} \pm {0.9} \) | \( {74.8} \pm {0.8} \) | \( {76.3} \pm {0.5} \) | 66.5 |
| DIVCAM-S* | \( {58.4} \pm {1.1} \) | \( {52.7} \pm {0.7} \) | \( {74.3} \pm {0.5} \) | \( {75.2} \pm {0.2} \) | 65.2 |
| 算法 | A | C | P | \( \mathbf{R} \) | \( \mathbf{{Avg}.} \) |
| 经验风险最小化(ERM) | \( {61.3} \pm {0.7} \) | \( {52.4} \pm {0.3} \) | \( {75.8} \pm {0.1} \) | \( {76.6} \pm {0.3} \) | 66.5 |
| 不变风险最小化(IRM) | \( {58.9} \pm {2.3} \) | \( {52.2} \pm {1.6} \) | \( {72.1} \pm {2.9} \) | \( {74.0} \pm {2.5} \) | 64.3 |
| 群体分布鲁棒优化(GroupDRO) | \( {60.4} \pm {0.7} \) | \( {52.7} \pm {1.0} \) | \( {75.0} \pm {0.7} \) | \( {76.0} \pm {0.7} \) | 66.0 |
| 混合增强(Mixup) | \( {62.4} \pm {0.8} \) | \( {54.8} \pm {0.6} \) | \( {76.9} \pm {0.3} \) | \( {78.3} \pm {0.2} \) | 68.1 |
| 元学习领域泛化(MLDG) | \( {61.5} \pm {0.9} \) | \( {53.2} \pm {0.6} \) | \( {75.0} \pm {1.2} \) | \( {77.5} \pm {0.4} \) | 66.8 |
| 相关对齐(CORAL) | \( {65.3} \pm {0.4} \) | \( {54.4} \pm {0.5} \) | \( {76.5} \pm {0.1} \) | \( {78.4} \pm {0.5} \) | 68.7 |
| 最大均值差异(MMD) | \( {60.4} \pm {0.2} \) | \( {53.3} \pm {0.3} \) | \( {74.3} \pm {0.1} \) | \( {77.4} \pm {0.6} \) | 66.3 |
| 领域对抗神经网络(DANN) | \( {59.9} \pm {1.3} \) | \( {53.0} \pm {0.3} \) | \( {73.6} \pm {0.7} \) | \( {76.9} \pm {0.5} \) | 65.9 |
| 条件领域对抗神经网络(CDANN) | \( {61.5} \pm {1.4} \) | \( {50.4} \pm {2.4} \) | \( {74.4} \pm {0.9} \) | \( {76.6} \pm {0.8} \) | 65.8 |
| 多任务学习(MTL) | \( {61.5} \pm {0.7} \) | \( {52.4} \pm {0.6} \) | \( {74.9} \pm {0.4} \) | \( {76.8} \pm {0.4} \) | 66.4 |
| 语义-属性网络(SagNet) | \( {63.4} \pm {0.2} \) | \( {54.8} \pm {0.4} \) | \( {75.8} \pm {0.4} \) | \( {78.3} \pm {0.3} \) | 68.1 |
| 自适应风险最小化(ARM) | \( {58.9} \pm {0.8} \) | \( {51.0} \pm {0.5} \) | \( {74.1} \pm {0.1} \) | \( {75.2} \pm {0.3} \) | 64.8 |
| 方差风险扩展(VREx) | \( {60.7} \pm {0.9} \) | \( {53.0} \pm {0.9} \) | \( {75.3} \pm {0.1} \) | \( {76.6} \pm {0.5} \) | 66.4 |
| 表示自适应剪枝(RSC) | \( {60.7} \pm {1.4} \) | \( {51.4} \pm {0.3} \) | \( {74.8} \pm {1.1} \) | \( {75.1} \pm {1.3} \) | 65.5 |
| 多样性类激活映射-小(DivCAM-S) | \( {59.5} \pm {0.3} \) | \( {49.7} \pm {0.1} \) | \( {75.4} \pm {0.7} \) | \( {76.2} \pm {0.2} \) | 65.2 |
| 经验风险最小化(ERM)* | \( {61.7} \pm {0.7} \) | \( {53.4} \pm {0.3} \) | \( {74.1} \pm {0.4} \) | \( {76.2} \pm {0.6} \) | 66.4 |
| 不变风险最小化(IRM)* | \( {56.4} \pm {3.2} \) | \( {51.2} \pm {2.3} \) | \( {71.7} \pm {2.7} \) | \( {72.7} \pm {2.7} \) | 63.0 |
| 群体分布鲁棒优化(GroupDRO)* | \( {60.5} \pm {1.6} \) | \( {53.1} \pm {0.3} \) | \( {75.5} \pm {0.3} \) | \( {75.9} \pm {0.7} \) | 66.2 |
| 混合增强(Mixup)* | \( {63.5} \pm {0.2} \) | \( {54.6} \pm {0.4} \) | \( {76.0} \pm {0.3} \) | \( {78.0} \pm {0.7} \) | 68.0 |
| 元学习领域泛化(MLDG)* | \( {60.5} \pm {0.7} \) | \( {54.2} \pm {0.5} \) | \( {75.0} \pm {0.2} \) | \( {76.7} \pm {0.5} \) | 66.6 |
| 相关对齐(CORAL)* | \( {64.8} \pm {0.8} \) | \( {54.1} \pm {0.9} \) | \( {76.5} \pm {0.4} \) | \( {78.2} \pm {0.4} \) | 68.4 |
| 最大均值差异(MMD)* | \( {60.4} \pm {1.0} \) | \( {53.4} \pm {0.5} \) | \( {74.9} \pm {0.1} \) | \( {76.1} \pm {0.7} \) | 66.2 |
| 领域对抗神经网络(DANN)* | \( {60.6} \pm {1.4} \) | \( {51.8} \pm {0.7} \) | \( {73.4} \pm {0.5} \) | \( {75.5} \pm {0.9} \) | 65.3 |
| 条件领域对抗神经网络(CDANN)* | \( {57.9} \pm {0.2} \) | \( {52.1} \pm {1.2} \) | \( {74.9} \pm {0.7} \) | \( {76.2} \pm {0.2} \) | 65.3 |
| 多任务学习(MTL)* | \( {60.7} \pm {0.8} \) | \( {53.5} \pm {1.3} \) | \( {75.2} \pm {0.6} \) | \( {76.6} \pm {0.6} \) | 66.5 |
| 语义-属性网络(SagNet)* | \( {62.7} \pm {0.5} \) | \( {53.6} \pm {0.5} \) | \( {76.0} \pm {0.3} \) | \( {77.8} \pm {0.1} \) | 67.5 |
| 自适应风险最小化(ARM)* | \( {58.8} \pm {0.5} \) | \( {51.8} \pm {0.7} \) | \( {74.0} \pm {0.1} \) | \( {74.4} \pm {0.2} \) | 64.8 |
| 方差风险扩展(VREx)* | \( {59.6} \pm {1.0} \) | \( {53.3} \pm {0.3} \) | \( {73.2} \pm {0.5} \) | \( {76.6} \pm {0.4} \) | 65.7 |
| 表示自适应剪枝(RSC)* | \( {61.7} \pm {0.8} \) | \( {53.0} \pm {0.9} \) | \( {74.8} \pm {0.8} \) | \( {76.3} \pm {0.5} \) | 66.5 |
| 多样性类激活映射-小(DIVCAM-S)* | \( {58.4} \pm {1.1} \) | \( {52.7} \pm {0.7} \) | \( {74.3} \pm {0.5} \) | \( {75.2} \pm {0.2} \) | 65.2 |
Table A.3: Domain specific performance for the Office-Home dataset using training-domain validation (top) and oracle validation denoted with * (bottom). We use a ResNet-50 backbone, optimize with ADAM, and follow the distributions specified in DOMAINBED. Only RSC and our methods have been added as part of this work, the other baselines are taken from DOMAINBED.
表A.3:使用训练域验证(上方)和带*标记的oracle验证(下方)对Office-Home数据集的领域特定性能。我们采用ResNet-50骨干网络,使用ADAM优化器,并遵循DOMAINBED中指定的分布。仅RSC和我们的方法作为本工作新增,其他基线均取自DOMAINBED。
| Algorithm | L100 | L38 | L43 | L46 | \( \mathbf{{Avg}.} \) |
| ERM | \( {49.8} \pm {4.4} \) | \( {42.1} \pm {1.4} \) | \( {56.9} \pm {1.8} \) | \( {35.7} \pm {3.9} \) | 46.1 |
| IRM | \( {54.6} \pm {1.3} \) | \( {39.8} \pm {1.9} \) | \( {56.2} \pm {1.8} \) | \( {39.6} \pm {0.8} \) | 47.6 |
| GroupDRO | \( {41.2} \pm {0.7} \) | \( {38.6} \pm {2.1} \) | \( {56.7} \pm {0.9} \) | \( {36.4} \pm {2.1} \) | 43.2 |
| Mixup | \( {59.6} \pm {2.0} \) | \( {42.2} \pm {1.4} \) | \( {55.9} \pm {0.8} \) | \( {33.9} \pm {1.4} \) | 47.9 |
| MLDG | \( {54.2} \pm {3.0} \) | \( {44.3} \pm {1.1} \) | \( {55.6} \pm {0.3} \) | \( {36.9} \pm {2.2} \) | 47.7 |
| CORAL | \( {51.6} \pm {2.4} \) | \( {42.2} \pm {1.0} \) | \( {57.0} \pm {1.0} \) | \( {39.8} \pm {2.9} \) | 47.6 |
| MMD | \( {41.9} \pm {3.0} \) | \( {34.8} \pm {1.0} \) | \( {57.0} \pm {1.9} \) | \( {35.2} \pm {1.8} \) | 42.2 |
| DANN | \( {51.1} \pm {3.5} \) | \( {40.6} \pm {0.6} \) | \( {57.4} \pm {0.5} \) | \( {37.7} \pm {1.8} \) | 46.7 |
| CDANN | \( {47.0} \pm {1.9} \) | \( {41.3} \pm {4.8} \) | \( {54.9} \pm {1.7} \) | \( {39.8} \pm {2.3} \) | 45.8 |
| MTL | \( {49.3} \pm {1.2} \) | \( {39.6} \pm {6.3} \) | \( {55.6} \pm {1.1} \) | \( {37.8} \pm {0.8} \) | 45.6 |
| SagNet | \( {53.0} \pm {2.9} \) | \( {43.0} \pm {2.5} \) | \( {57.9} \pm {0.6} \) | \( {40.4} \pm {1.3} \) | 48.6 |
| ARM | \( {49.3} \pm {0.7} \) | \( {38.3} \pm {2.4} \) | \( {55.8} \pm {0.8} \) | \( {38.7} \pm {1.3} \) | 45.5 |
| VREx | \( {48.2} \pm {4.3} \) | \( {41.7} \pm {1.3} \) | \( {56.8} \pm {0.8} \) | \( {38.7} \pm {3.1} \) | 46.4 |
| RSC | \( {50.2} \pm {2.2} \) | \( {39.2} \pm {1.4} \) | \( {56.3} \pm {1.4} \) | \( {40.8} \pm {0.6} \) | 46.6 |
| DivCAM-S | \( {51.6} \pm {2.2} \) | \( {44.4} \pm {2.1} \) | \( {55.2} \pm {1.7} \) | \( {40.7} \pm {2.6} \) | 48.0 |
| D-TRANSFORMERS | \( {53.7} \pm {1.0} \) | \( {29.4} \pm {2.5} \) | \( {53.9} \pm {1.0} \) | \( {34.5} \pm {3.1} \) | 42.9 |
| ERM* | \( {59.4} \pm {0.9} \) | \( {49.3} \pm {0.6} \) | \( {60.1} \pm {1.1} \) | \( {43.2} \pm {0.5} \) | 53.0 |
| IRM* | \( {56.5} \pm {2.5} \) | \( {49.8} \pm {1.5} \) | \( {57.1} \pm {2.2} \) | \( {38.6} \pm {1.0} \) | 50.5 |
| GroupDRO* | \( {60.4} \pm {1.5} \) | \( {48.3} \pm {0.4} \) | \( {58.6} \pm {0.8} \) | \( {42.2} \pm {0.8} \) | 52.4 |
| Mixup* | \( {67.6} \pm {1.8} \) | \( {51.0} \pm {1.3} \) | \( {59.0} \pm {0.0} \) | \( {40.0} \pm {1.1} \) | 54.4 |
| MLDG* | \( {59.2} \pm {0.1} \) | \( {49.0} \pm {0.9} \) | \( {58.4} \pm {0.9} \) | \( {41.4} \pm {1.0} \) | 52.0 |
| CORAL* | \( {60.4} \pm {0.9} \) | \( {47.2} \pm {0.5} \) | \( {59.3} \pm {0.4} \) | \( {44.4} \pm {0.4} \) | 52.8 |
| MMD* | \( {60.6} \pm {1.1} \) | \( {45.9} \pm {0.3} \) | \( {57.8} \pm {0.5} \) | \( {43.8} \pm {1.2} \) | 52.0 |
| DANN* | \( {55.2} \pm {1.9} \) | \( {47.0} \pm {0.7} \) | \( {57.2} \pm {0.9} \) | \( {42.9} \pm {0.9} \) | 50.6 |
| CDANN* | \( {56.3} \pm {2.0} \) | \( {47.1} \pm {0.9} \) | \( {57.2} \pm {1.1} \) | \( {42.4} \pm {0.8} \) | 50.8 |
| MTL* | \( {58.4} \pm {2.1} \) | \( {48.4} \pm {0.8} \) | \( {58.9} \pm {0.6} \) | \( {43.0} \pm {1.3} \) | 52.2 |
| SagNet* | \( {56.4} \pm {1.9} \) | \( {50.5} \pm {2.3} \) | \( {59.1} \pm {0.5} \) | \( {44.1} \pm {0.6} \) | 52.5 |
| ARM* | \( {60.1} \pm {1.5} \) | \( {48.3} \pm {1.6} \) | \( {55.3} \pm {0.6} \) | \( {40.9} \pm {1.1} \) | 51.2 |
| VREx* | \( {56.8} \pm {1.7} \) | \( {46.5} \pm {0.5} \) | \( {58.4} \pm {0.3} \) | \( {43.8} \pm {0.3} \) | 51.4 |
| RSC* | \( {59.9} \pm {1.4} \) | \( {46.7} \pm {0.4} \) | \( {57.8} \pm {0.5} \) | \( {44.3} \pm {0.6} \) | 52.1 |
| DivCAM-S* | \( {57.7} \pm {1.1} \) | \( {46.0} \pm {1.3} \) | \( {58.9} \pm {0.4} \) | \( {42.5} \pm {0.7} \) | 51.3 |
| D-TRANSFORMERS* | \( {59.7} \pm {0.6} \) | \( {51.1} \pm {1.4} \) | \( {56.5} \pm {0.4} \) | \( {42.2} \pm {1.0} \) | 52.4 |
| 算法 | L100 | L38 | L43 | L46 | \( \mathbf{{Avg}.} \) |
| 经验风险最小化(ERM) | \( {49.8} \pm {4.4} \) | \( {42.1} \pm {1.4} \) | \( {56.9} \pm {1.8} \) | \( {35.7} \pm {3.9} \) | 46.1 |
| 不变风险最小化(IRM) | \( {54.6} \pm {1.3} \) | \( {39.8} \pm {1.9} \) | \( {56.2} \pm {1.8} \) | \( {39.6} \pm {0.8} \) | 47.6 |
| 群体分布鲁棒优化(GroupDRO) | \( {41.2} \pm {0.7} \) | \( {38.6} \pm {2.1} \) | \( {56.7} \pm {0.9} \) | \( {36.4} \pm {2.1} \) | 43.2 |
| 混合增强(Mixup) | \( {59.6} \pm {2.0} \) | \( {42.2} \pm {1.4} \) | \( {55.9} \pm {0.8} \) | \( {33.9} \pm {1.4} \) | 47.9 |
| 元学习领域泛化(MLDG) | \( {54.2} \pm {3.0} \) | \( {44.3} \pm {1.1} \) | \( {55.6} \pm {0.3} \) | \( {36.9} \pm {2.2} \) | 47.7 |
| 相关对齐(CORAL) | \( {51.6} \pm {2.4} \) | \( {42.2} \pm {1.0} \) | \( {57.0} \pm {1.0} \) | \( {39.8} \pm {2.9} \) | 47.6 |
| 最大均值差异(MMD) | \( {41.9} \pm {3.0} \) | \( {34.8} \pm {1.0} \) | \( {57.0} \pm {1.9} \) | \( {35.2} \pm {1.8} \) | 42.2 |
| 领域对抗神经网络(DANN) | \( {51.1} \pm {3.5} \) | \( {40.6} \pm {0.6} \) | \( {57.4} \pm {0.5} \) | \( {37.7} \pm {1.8} \) | 46.7 |
| 条件领域对抗神经网络(CDANN) | \( {47.0} \pm {1.9} \) | \( {41.3} \pm {4.8} \) | \( {54.9} \pm {1.7} \) | \( {39.8} \pm {2.3} \) | 45.8 |
| 多任务学习(MTL) | \( {49.3} \pm {1.2} \) | \( {39.6} \pm {6.3} \) | \( {55.6} \pm {1.1} \) | \( {37.8} \pm {0.8} \) | 45.6 |
| 语义-属性网络(SagNet) | \( {53.0} \pm {2.9} \) | \( {43.0} \pm {2.5} \) | \( {57.9} \pm {0.6} \) | \( {40.4} \pm {1.3} \) | 48.6 |
| 自适应风险最小化(ARM) | \( {49.3} \pm {0.7} \) | \( {38.3} \pm {2.4} \) | \( {55.8} \pm {0.8} \) | \( {38.7} \pm {1.3} \) | 45.5 |
| 方差风险扩展(VREx) | \( {48.2} \pm {4.3} \) | \( {41.7} \pm {1.3} \) | \( {56.8} \pm {0.8} \) | \( {38.7} \pm {3.1} \) | 46.4 |
| 表示自适应剪枝(RSC) | \( {50.2} \pm {2.2} \) | \( {39.2} \pm {1.4} \) | \( {56.3} \pm {1.4} \) | \( {40.8} \pm {0.6} \) | 46.6 |
| 多样性类激活映射-小型(DivCAM-S) | \( {51.6} \pm {2.2} \) | \( {44.4} \pm {2.1} \) | \( {55.2} \pm {1.7} \) | \( {40.7} \pm {2.6} \) | 48.0 |
| D-变换器(D-TRANSFORMERS) | \( {53.7} \pm {1.0} \) | \( {29.4} \pm {2.5} \) | \( {53.9} \pm {1.0} \) | \( {34.5} \pm {3.1} \) | 42.9 |
| 经验风险最小化(ERM)* | \( {59.4} \pm {0.9} \) | \( {49.3} \pm {0.6} \) | \( {60.1} \pm {1.1} \) | \( {43.2} \pm {0.5} \) | 53.0 |
| 不变风险最小化(IRM)* | \( {56.5} \pm {2.5} \) | \( {49.8} \pm {1.5} \) | \( {57.1} \pm {2.2} \) | \( {38.6} \pm {1.0} \) | 50.5 |
| 群体分布鲁棒优化(GroupDRO)* | \( {60.4} \pm {1.5} \) | \( {48.3} \pm {0.4} \) | \( {58.6} \pm {0.8} \) | \( {42.2} \pm {0.8} \) | 52.4 |
| 混合增强(Mixup)* | \( {67.6} \pm {1.8} \) | \( {51.0} \pm {1.3} \) | \( {59.0} \pm {0.0} \) | \( {40.0} \pm {1.1} \) | 54.4 |
| 元学习领域泛化(MLDG)* | \( {59.2} \pm {0.1} \) | \( {49.0} \pm {0.9} \) | \( {58.4} \pm {0.9} \) | \( {41.4} \pm {1.0} \) | 52.0 |
| 相关对齐(CORAL)* | \( {60.4} \pm {0.9} \) | \( {47.2} \pm {0.5} \) | \( {59.3} \pm {0.4} \) | \( {44.4} \pm {0.4} \) | 52.8 |
| 最大均值差异(MMD)* | \( {60.6} \pm {1.1} \) | \( {45.9} \pm {0.3} \) | \( {57.8} \pm {0.5} \) | \( {43.8} \pm {1.2} \) | 52.0 |
| 领域对抗神经网络(DANN)* | \( {55.2} \pm {1.9} \) | \( {47.0} \pm {0.7} \) | \( {57.2} \pm {0.9} \) | \( {42.9} \pm {0.9} \) | 50.6 |
| 条件领域对抗神经网络(CDANN)* | \( {56.3} \pm {2.0} \) | \( {47.1} \pm {0.9} \) | \( {57.2} \pm {1.1} \) | \( {42.4} \pm {0.8} \) | 50.8 |
| 多任务学习(MTL)* | \( {58.4} \pm {2.1} \) | \( {48.4} \pm {0.8} \) | \( {58.9} \pm {0.6} \) | \( {43.0} \pm {1.3} \) | 52.2 |
| 语义-属性网络(SagNet)* | \( {56.4} \pm {1.9} \) | \( {50.5} \pm {2.3} \) | \( {59.1} \pm {0.5} \) | \( {44.1} \pm {0.6} \) | 52.5 |
| 自适应风险最小化(ARM)* | \( {60.1} \pm {1.5} \) | \( {48.3} \pm {1.6} \) | \( {55.3} \pm {0.6} \) | \( {40.9} \pm {1.1} \) | 51.2 |
| 方差风险扩展(VREx)* | \( {56.8} \pm {1.7} \) | \( {46.5} \pm {0.5} \) | \( {58.4} \pm {0.3} \) | \( {43.8} \pm {0.3} \) | 51.4 |
| 表示自适应剪枝(RSC)* | \( {59.9} \pm {1.4} \) | \( {46.7} \pm {0.4} \) | \( {57.8} \pm {0.5} \) | \( {44.3} \pm {0.6} \) | 52.1 |
| 多样性类激活映射-小型(DivCAM-S)* | \( {57.7} \pm {1.1} \) | \( {46.0} \pm {1.3} \) | \( {58.9} \pm {0.4} \) | \( {42.5} \pm {0.7} \) | 51.3 |
| D-变换器(D-TRANSFORMERS)* | \( {59.7} \pm {0.6} \) | \( {51.1} \pm {1.4} \) | \( {56.5} \pm {0.4} \) | \( {42.2} \pm {1.0} \) | 52.4 |
Table A.4: Domain specific performance for the Terra Incognita dataset using training-domain validation (top) and oracle validation denoted with * (bottom). We use a ResNet-50 backbone, optimize with ADAM, and follow the distributions specified in DOMAINBED. Only RSC and our methods have been added as part of this work, the other baselines are taken from DomAINBED.
表A.4:使用训练域验证(上方)和带*标记的oracle验证(下方)对Terra Incognita数据集的特定领域性能评估。我们采用ResNet-50主干网络,使用ADAM优化器,并遵循DOMAINBED中指定的分布。仅RSC和我们的方法作为本工作新增,其他基线均取自DOMAINBED。
| Algorithm | clip | info | paint | quick | real | sketch | Avg. |
| ERM | \( {58.1} \pm {0.3} \) | \( {18.8} \pm {0.3} \) | \( {46.7} \pm {0.3} \) | \( {12.2} \pm {0.4} \) | \( {59.6} \pm {0.1} \) | \( {49.8} \pm {0.4} \) | 40.9 |
| IRM | \( {48.5} \pm {2.8} \) | \( {15.0} \pm {1.5} \) | \( {38.3} \pm {4.3} \) | \( {10.9} \pm {0.5} \) | \( {48.2} \pm {5.2} \) | \( {42.3} \pm {3.1} \) | 33.9 |
| GroupDRO | \( {47.2} \pm {0.5} \) | \( {17.5} \pm {0.4} \) | \( {33.8} \pm {0.5} \) | \( {9.3} \pm {0.3} \) | \( {51.6} \pm {0.4} \) | \( {40.1} \pm {0.6} \) | 33.3 |
| Mixup | \( {55.7} \pm {0.3} \) | \( {18.5} \pm {0.5} \) | \( {44.3} \pm {0.5} \) | \( {12.5} \pm {0.4} \) | \( {55.8} \pm {0.3} \) | \( {48.2} \pm {0.5} \) | 39.2 |
| MLDG | \( {59.1} \pm {0.2} \) | \( {19.1} \pm {0.3} \) | \( {45.8} \pm {0.7} \) | \( {13.4} \pm {0.3} \) | \( {59.6} \pm {0.2} \) | \( {50.2} \pm {0.4} \) | 41.2 |
| CORAL | \( {59.2} \pm {0.1} \) | \( {19.7} \pm {0.2} \) | \( {46.6} \pm {0.3} \) | \( {13.4} \pm {0.4} \) | \( {59.8} \pm {0.2} \) | \( {50.1} \pm {0.6} \) | 41.5 |
| MMD | \( {32.1} \pm {13.3} \) | \( {11.0} \pm {4.6} \) | \( {26.8} \pm {11.3} \) | \( {8.7} \pm {2.1} \) | \( {32.7} \pm {13.8} \) | \( {28.9} \pm {11.9} \) | 23.4 |
| DANN | \( {53.1} \pm {0.2} \) | \( {18.3} \pm {0.1} \) | \( {44.2} \pm {0.7} \) | \( {11.8} \pm {0.1} \) | \( {55.5} \pm {0.4} \) | \( {46.8} \pm {0.6} \) | 38.3 |
| CDANN | \( {54.6} \pm {0.4} \) | \( {17.3} \pm {0.1} \) | \( {43.7} \pm {0.9} \) | \( {12.1} \pm {0.7} \) | \( {56.2} \pm {0.4} \) | \( {45.9} \pm {0.5} \) | 38.3 |
| MTL | \( {57.9} \pm {0.5} \) | \( {18.5} \pm {0.4} \) | \( {46.0} \pm {0.1} \) | \( {12.5} \pm {0.1} \) | \( {59.5} \pm {0.3} \) | \( {49.2} \pm {0.1} \) | 40.6 |
| SagNet | \( {57.7} \pm {0.3} \) | \( {19.0} \pm {0.2} \) | \( {45.3} \pm {0.3} \) | \( {12.7} \pm {0.5} \) | \( {58.1} \pm {0.5} \) | \( {48.8} \pm {0.2} \) | 40.3 |
| ARM | \( {49.7} \pm {0.3} \) | \( {16.3} \pm {0.5} \) | \( {40.9} \pm {1.1} \) | \( {9.4} \pm {0.1} \) | \( {53.4} \pm {0.4} \) | \( {43.5} \pm {0.4} \) | 35.5 |
| VREx | \( {47.3} \pm {3.5} \) | \( {16.0} \pm {1.5} \) | \( {35.8} \pm {4.6} \) | \( {10.9} \pm {0.3} \) | \( {49.6} \pm {4.9} \) | \( {42.0} \pm {3.0} \) | 33.6 |
| RSC | \( {55.0} \pm {1.2} \) | \( {18.3} \pm {0.5} \) | \( {44.4} \pm {0.6} \) | \( {12.2} \pm {0.2} \) | \( {55.7} \pm {0.7} \) | \( {47.8} \pm {0.9} \) | 38.9 |
| DIVCAM-S | \( {57.7} \pm {0.3} \) | \( {19.3} \pm {0.3} \) | \( {46.8} \pm {0.2} \) | \( {12.7} \pm {0.4} \) | \( {58.9} \pm {0.2} \) | \( {48.5} \pm {0.4} \) | 40.7 |
| ERM* | \( {58.6} \pm {0.3} \) | \( {19.2} \pm {0.2} \) | \( {47.0} \pm {0.3} \) | \( {13.2} \pm {0.2} \) | \( {59.9} \pm {0.3} \) | \( {49.8} \pm {0.4} \) | 41.3 |
| IRM* | \( {40.4} \pm {6.6} \) | \( {12.1} \pm {2.7} \) | \( {31.4} \pm {5.7} \) | \( {9.8} \pm {1.2} \) | \( {37.7} \pm {9.0} \) | \( {36.7} \pm {5.3} \) | 28.0 |
| GroupDRO* | \( {47.2} \pm {0.5} \) | \( {17.5} \pm {0.4} \) | \( {34.2} \pm {0.3} \) | \( {9.2} \pm {0.4} \) | \( {51.9} \pm {0.5} \) | \( {40.1} \pm {0.6} \) | 33.4 |
| Mixup* | \( {55.6} \pm {0.1} \) | \( {18.7} \pm {0.4} \) | \( {45.1} \pm {0.5} \) | \( {12.8} \pm {0.3} \) | \( {57.6} \pm {0.5} \) | \( {48.2} \pm {0.4} \) | 39.6 |
| MLDG* | \( {59.3} \pm {0.1} \) | \( {19.6} \pm {0.2} \) | \( {46.8} \pm {0.2} \) | \( {13.4} \pm {0.2} \) | \( {60.1} \pm {0.4} \) | \( {50.4} \pm {0.3} \) | 41.6 |
| CORAL* | \( {59.2} \pm {0.1} \) | \( {19.9} \pm {0.2} \) | \( {47.4} \pm {0.2} \) | \( {14.0} \pm {0.4} \) | \( {59.8} \pm {0.2} \) | \( {50.4} \pm {0.4} \) | 41.8 |
| MMD* | \( {32.2} \pm {13.3} \) | \( {11.2} \pm {4.5} \) | \( {26.8} \pm {11.3} \) | \( {8.8} \pm {2.2} \) | \( {32.7} \pm {13.8} \) | \( {29.0} \pm {11.8} \) | 23.5 |
| DANN* | \( {53.1} \pm {0.2} \) | \( {18.3} \pm {0.1} \) | \( {44.2} \pm {0.7} \) | \( {11.9} \pm {0.1} \) | \( {55.5} \pm {0.4} \) | \( {46.8} \pm {0.6} \) | 38.3 |
| CDANN* | \( {54.6} \pm {0.4} \) | \( {17.3} \pm {0.1} \) | \( {44.2} \pm {0.7} \) | \( {12.8} \pm {0.2} \) | \( {56.2} \pm {0.4} \) | \( {45.9} \pm {0.5} \) | 38.5 |
| MTL* | \( {58.0} \pm {0.4} \) | \( {19.2} \pm {0.2} \) | \( {46.2} \pm {0.1} \) | \( {12.7} \pm {0.2} \) | \( {59.9} \pm {0.1} \) | \( {49.0} \pm {0.0} \) | 40.8 |
| SagNet* | \( {57.7} \pm {0.3} \) | \( {19.1} \pm {0.1} \) | \( {46.3} \pm {0.5} \) | \( {13.5} \pm {0.4} \) | \( {58.9} \pm {0.4} \) | \( {49.5} \pm {0.2} \) | 40.8 |
| ARM* | \( {49.6} \pm {0.4} \) | \( {16.5} \pm {0.3} \) | \( {41.5} \pm {0.8} \) | \( {10.8} \pm {0.1} \) | \( {53.5} \pm {0.3} \) | \( {43.9} \pm {0.4} \) | 36.0 |
| VREx* | \( {43.3} \pm {4.5} \) | \( {14.1} \pm {1.8} \) | \( {32.5} \pm {5.0} \) | \( {9.8} \pm {1.1} \) | \( {43.5} \pm {5.6} \) | \( {37.7} \pm {4.5} \) | 30.1 |
| RSC* | \( {55.0} \pm {1.2} \) | \( {18.3} \pm {0.5} \) | \( {44.4} \pm {0.6} \) | \( {12.5} \pm {0.1} \) | \( {55.7} \pm {0.7} \) | \( {47.8} \pm {0.9} \) | 38.9 |
| DivCAM-S* | \( {57.8} \pm {0.2} \) | \( {19.3} \pm {0.3} \) | \( {47.0} \pm {0.1} \) | \( {13.1} \pm {0.3} \) | \( {59.6} \pm {0.1} \) | \( {48.9} \pm {0.2} \) | 41.0 |
| 算法 | 剪辑 | 信息 | 绘画 | 快速 | 真实 | 素描 | 平均 |
| ERM(经验风险最小化) | \( {58.1} \pm {0.3} \) | \( {18.8} \pm {0.3} \) | \( {46.7} \pm {0.3} \) | \( {12.2} \pm {0.4} \) | \( {59.6} \pm {0.1} \) | \( {49.8} \pm {0.4} \) | 40.9 |
| IRM(不变风险最小化) | \( {48.5} \pm {2.8} \) | \( {15.0} \pm {1.5} \) | \( {38.3} \pm {4.3} \) | \( {10.9} \pm {0.5} \) | \( {48.2} \pm {5.2} \) | \( {42.3} \pm {3.1} \) | 33.9 |
| GroupDRO(组分布鲁棒优化) | \( {47.2} \pm {0.5} \) | \( {17.5} \pm {0.4} \) | \( {33.8} \pm {0.5} \) | \( {9.3} \pm {0.3} \) | \( {51.6} \pm {0.4} \) | \( {40.1} \pm {0.6} \) | 33.3 |
| Mixup(混合增强) | \( {55.7} \pm {0.3} \) | \( {18.5} \pm {0.5} \) | \( {44.3} \pm {0.5} \) | \( {12.5} \pm {0.4} \) | \( {55.8} \pm {0.3} \) | \( {48.2} \pm {0.5} \) | 39.2 |
| MLDG(元学习领域泛化) | \( {59.1} \pm {0.2} \) | \( {19.1} \pm {0.3} \) | \( {45.8} \pm {0.7} \) | \( {13.4} \pm {0.3} \) | \( {59.6} \pm {0.2} \) | \( {50.2} \pm {0.4} \) | 41.2 |
| CORAL(相关对齐) | \( {59.2} \pm {0.1} \) | \( {19.7} \pm {0.2} \) | \( {46.6} \pm {0.3} \) | \( {13.4} \pm {0.4} \) | \( {59.8} \pm {0.2} \) | \( {50.1} \pm {0.6} \) | 41.5 |
| MMD(最大均值差异) | \( {32.1} \pm {13.3} \) | \( {11.0} \pm {4.6} \) | \( {26.8} \pm {11.3} \) | \( {8.7} \pm {2.1} \) | \( {32.7} \pm {13.8} \) | \( {28.9} \pm {11.9} \) | 23.4 |
| DANN(域对抗神经网络) | \( {53.1} \pm {0.2} \) | \( {18.3} \pm {0.1} \) | \( {44.2} \pm {0.7} \) | \( {11.8} \pm {0.1} \) | \( {55.5} \pm {0.4} \) | \( {46.8} \pm {0.6} \) | 38.3 |
| CDANN(条件域对抗神经网络) | \( {54.6} \pm {0.4} \) | \( {17.3} \pm {0.1} \) | \( {43.7} \pm {0.9} \) | \( {12.1} \pm {0.7} \) | \( {56.2} \pm {0.4} \) | \( {45.9} \pm {0.5} \) | 38.3 |
| MTL(多任务学习) | \( {57.9} \pm {0.5} \) | \( {18.5} \pm {0.4} \) | \( {46.0} \pm {0.1} \) | \( {12.5} \pm {0.1} \) | \( {59.5} \pm {0.3} \) | \( {49.2} \pm {0.1} \) | 40.6 |
| SagNet(语义-属性分离网络) | \( {57.7} \pm {0.3} \) | \( {19.0} \pm {0.2} \) | \( {45.3} \pm {0.3} \) | \( {12.7} \pm {0.5} \) | \( {58.1} \pm {0.5} \) | \( {48.8} \pm {0.2} \) | 40.3 |
| ARM(自适应风险最小化) | \( {49.7} \pm {0.3} \) | \( {16.3} \pm {0.5} \) | \( {40.9} \pm {1.1} \) | \( {9.4} \pm {0.1} \) | \( {53.4} \pm {0.4} \) | \( {43.5} \pm {0.4} \) | 35.5 |
| VREx(方差风险外推) | \( {47.3} \pm {3.5} \) | \( {16.0} \pm {1.5} \) | \( {35.8} \pm {4.6} \) | \( {10.9} \pm {0.3} \) | \( {49.6} \pm {4.9} \) | \( {42.0} \pm {3.0} \) | 33.6 |
| RSC(随机子空间校正) | \( {55.0} \pm {1.2} \) | \( {18.3} \pm {0.5} \) | \( {44.4} \pm {0.6} \) | \( {12.2} \pm {0.2} \) | \( {55.7} \pm {0.7} \) | \( {47.8} \pm {0.9} \) | 38.9 |
| DIVCAM-S | \( {57.7} \pm {0.3} \) | \( {19.3} \pm {0.3} \) | \( {46.8} \pm {0.2} \) | \( {12.7} \pm {0.4} \) | \( {58.9} \pm {0.2} \) | \( {48.5} \pm {0.4} \) | 40.7 |
| ERM* | \( {58.6} \pm {0.3} \) | \( {19.2} \pm {0.2} \) | \( {47.0} \pm {0.3} \) | \( {13.2} \pm {0.2} \) | \( {59.9} \pm {0.3} \) | \( {49.8} \pm {0.4} \) | 41.3 |
| IRM* | \( {40.4} \pm {6.6} \) | \( {12.1} \pm {2.7} \) | \( {31.4} \pm {5.7} \) | \( {9.8} \pm {1.2} \) | \( {37.7} \pm {9.0} \) | \( {36.7} \pm {5.3} \) | 28.0 |
| GroupDRO* | \( {47.2} \pm {0.5} \) | \( {17.5} \pm {0.4} \) | \( {34.2} \pm {0.3} \) | \( {9.2} \pm {0.4} \) | \( {51.9} \pm {0.5} \) | \( {40.1} \pm {0.6} \) | 33.4 |
| Mixup* | \( {55.6} \pm {0.1} \) | \( {18.7} \pm {0.4} \) | \( {45.1} \pm {0.5} \) | \( {12.8} \pm {0.3} \) | \( {57.6} \pm {0.5} \) | \( {48.2} \pm {0.4} \) | 39.6 |
| MLDG* | \( {59.3} \pm {0.1} \) | \( {19.6} \pm {0.2} \) | \( {46.8} \pm {0.2} \) | \( {13.4} \pm {0.2} \) | \( {60.1} \pm {0.4} \) | \( {50.4} \pm {0.3} \) | 41.6 |
| CORAL* | \( {59.2} \pm {0.1} \) | \( {19.9} \pm {0.2} \) | \( {47.4} \pm {0.2} \) | \( {14.0} \pm {0.4} \) | \( {59.8} \pm {0.2} \) | \( {50.4} \pm {0.4} \) | 41.8 |
| MMD* | \( {32.2} \pm {13.3} \) | \( {11.2} \pm {4.5} \) | \( {26.8} \pm {11.3} \) | \( {8.8} \pm {2.2} \) | \( {32.7} \pm {13.8} \) | \( {29.0} \pm {11.8} \) | 23.5 |
| DANN* | \( {53.1} \pm {0.2} \) | \( {18.3} \pm {0.1} \) | \( {44.2} \pm {0.7} \) | \( {11.9} \pm {0.1} \) | \( {55.5} \pm {0.4} \) | \( {46.8} \pm {0.6} \) | 38.3 |
| CDANN* | \( {54.6} \pm {0.4} \) | \( {17.3} \pm {0.1} \) | \( {44.2} \pm {0.7} \) | \( {12.8} \pm {0.2} \) | \( {56.2} \pm {0.4} \) | \( {45.9} \pm {0.5} \) | 38.5 |
| MTL* | \( {58.0} \pm {0.4} \) | \( {19.2} \pm {0.2} \) | \( {46.2} \pm {0.1} \) | \( {12.7} \pm {0.2} \) | \( {59.9} \pm {0.1} \) | \( {49.0} \pm {0.0} \) | 40.8 |
| SagNet* | \( {57.7} \pm {0.3} \) | \( {19.1} \pm {0.1} \) | \( {46.3} \pm {0.5} \) | \( {13.5} \pm {0.4} \) | \( {58.9} \pm {0.4} \) | \( {49.5} \pm {0.2} \) | 40.8 |
| ARM* | \( {49.6} \pm {0.4} \) | \( {16.5} \pm {0.3} \) | \( {41.5} \pm {0.8} \) | \( {10.8} \pm {0.1} \) | \( {53.5} \pm {0.3} \) | \( {43.9} \pm {0.4} \) | 36.0 |
| VREx* | \( {43.3} \pm {4.5} \) | \( {14.1} \pm {1.8} \) | \( {32.5} \pm {5.0} \) | \( {9.8} \pm {1.1} \) | \( {43.5} \pm {5.6} \) | \( {37.7} \pm {4.5} \) | 30.1 |
| RSC* | \( {55.0} \pm {1.2} \) | \( {18.3} \pm {0.5} \) | \( {44.4} \pm {0.6} \) | \( {12.5} \pm {0.1} \) | \( {55.7} \pm {0.7} \) | \( {47.8} \pm {0.9} \) | 38.9 |
| DivCAM-S* | \( {57.8} \pm {0.2} \) | \( {19.3} \pm {0.3} \) | \( {47.0} \pm {0.1} \) | \( {13.1} \pm {0.3} \) | \( {59.6} \pm {0.1} \) | \( {48.9} \pm {0.2} \) | 41.0 |
Table A.5: Domain specific performance for the DomainNet dataset using training-domain validation (top) and oracle validation denoted with * (bottom). We use a ResNet-50 backbone, optimize with ADAM, and follow the distributions specified in DOMAINBED. Only RSC and our methods have been added as part of this work, the other baselines are taken from DOMAINBED.
表A.5:使用训练域验证(上方)和带*标记的oracle验证(下方)对DomainNet数据集的特定领域性能评估。我们采用ResNet-50主干网络,使用ADAM优化器,并遵循DOMAINBED中指定的分布。仅将RSC和我们的方法作为本工作新增,其他基线均取自DOMAINBED。

Figure B.1: Pairwise learned prototype
图B.1:最佳模型在每个测试域上的成对学习原型

Figure B.2: Pairwise learned prototype
图B.2:最佳模型在每个测试域上的成对学习原型

Figure B.3: Pairwise learned prototype
图B.3:最佳模型在每个测试域上的成对学习原型

Figure B.4: Pairwise learned prototype
图B.4:最佳模型在每个测试域上的成对学习原型

Figure B.5: Pairwise learned prototype
图B.5:最佳模型在每个测试域上的成对学习原型

Figure B.6: Pairwise learned prototype
图B.6:最佳模型在每个测试域上的成对学习原型

Figure B.7: Pairwise learned prototype
图B.7:最佳模型在每个测试域上的成对学习原型

Figure B.8: Pairwise learned prototype
图B.8:最佳表现模型在每个测试域中负权重

Figure B.9: Pairwise learned prototype
图B.9:最佳表现模型在每个测试域中负权重

Figure B.10: Pairwise learned prototype
图B.10:最佳表现模型在每个测试域中负权重